Captioning an image is a concept of producing a succinct content description for an input image in single sentence
considering all the objects in an image in the form of description. It can be done using deep learning architectures with the help
of CNN (Convolution Neural Network) and RNN (Recurrent Neural Network). A particular kind of RNN called long short-term
memory (LSTM) is used. The image from the dataset is taken as the input and accordingly caption is produced as an output from
the given set in the form of text. It has numerous applications in various fields namely Image Indexing, Application
Recommendation, Social media etc. It also aids the visually impaired and short sightedness people by automatically decoding the
image and describing it in the form of text in a large format