This paper aims at reviewing existing YOLO architecture, its implementation and working. You only look once, is an
architecture which is a regression problem. Yolo comes in different versions such as YoloV1, YoloV2 and YoloV3.The feature
extractor for Yolo is Darknet.
The network looks at the entire image only once. In one evaluation, a single neural network predicts bounding boxes and class
probabilities directly from full images. The unified architecture is extremely fast. YOLO model processes images in real-time at
45 frames per second.
Fast YOLO, an extremely fast version of Yolo, processes 155 frames per second. Yolo is better at making less localization errors
as it looks at the entire image to predict objects on individual cells. This paper also provides details on the evolution and
evaluation of the architecture