Image captioning is a complicated research area of Arti cial Intel-ligence (AI) which requires a functional and robust
model that generates a caption for any image. Image captioning is a fundamental task which requires not only semantic
understanding of images but also the interactions between the objects present in the image.Another task is to understand the
visual lan-guage dynamics and to translate these relations into sensible captions . In this paper , we have proposed an
architecture employing the use of multilayer Convolutional Neural Network (CNN) which is used for image processing and
extracting features from the image.