To recognise the context of an image and describe it in a natural language like English, the fundamental task of
creating image captions uses computer vision and natural language processing techniques. To create a natural language
description from an input image, image caption generation is used. Convolutional Neural Network (CNN) model and Long
Short-Term Memory (LSTM) model are the two parts of this Python project that are used to implement it.