Object detection is one of the main and most difficult computer vision branches widely used to track instances of
semantically objects of a certain class in the lives of individuals, such as security monitoring, autonomous drive, etc. The
efficiency of object detectors has been significantly enhanced with the rapid growth of deep learning networks for detection tasks.
The architecture suggested uses pre-trained networks such as ALEXNET and VGG-16 to identify specific artifacts utilizing a
PASCAL VOC 2007 dataset in today's language. The 25 layers of ALEXNET and VGG-16 are 41. Two principal directions are
explored: supervised learning and semi-monitored learning. The disadvantages of supervised learning approaches drive
unattended pre-training to be explored. By studying strong representations in early layers, layers can be educated quicker and
more effectively