Over the past few years, extensive research has been devoted to enhancing
YOLO object detectors. Since its introduction, eight major versions of YOLO
have been introduced with the purpose of improving its accuracy and efficiency.
While the evident merits of YOLO have yielded to its extensive use in many
areas, deploying it on resource-limited devices poses challenges. To address
this issue, various neural network compression methods have been developed,
which fall under three main categories, namely network pruning, quantization,
and knowledge distillation. The fruitful outcomes of utilizing model
compression methods, such as lowering memory usage and inference time, make
them favorable, if not necessary, for deploying large neural networks on
hardware-constrained edge devices. In this review paper, our focus is on
pruning and quantization due to their comparative modularity. We categorize
them and analyze the practical results of applying those methods to YOLOv5. By
doing so, we identify gaps in adapting pruning and quantization for compressing
YOLOv5, and provide future directions in this area for further exploration.
Among several versions of YOLO, we specifically choose YOLOv5 for its excellent
trade-off between recency and popularity in literature. This is the first
specific review paper that surveys pruning and quantization methods from an
implementation point of view on YOLOv5. Our study is also extendable to newer
versions of YOLO as implementing them on resource-limited devices poses the
same challenges that persist even today. This paper targets those interested in
the practical deployment of model compression methods on YOLOv5, and in
exploring different compression techniques that can be used for subsequent
versions of YOLO.Comment: 18 pages, 7 Figure