: In simple terms, Image Captioning means the task of describing an image. Description comprises all the details which
describe the objects within the image and their spatial connectivity. This task is performed very effectively by a human being and
there are various methods to make a machine do so. In this project, we develop the encoder –decoder model to caption the image,
to improve its accuracy we are using the visual attention mechanism. We used the Local/Bahdanau attention mechanism which
attends to only a subset of words and is computationally simpler than global attention. We are using BLEU as a metric to
evaluate the correctness of the generated caption.