Search CORE

2 research outputs found

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Author: Doan Nguyen Anh Vu
Fasfous Nael
Frickenstein Alexander
Martina Maurizio
Nagaraja Naveen Shankar
Salihu Driton
Stechele Walter
Unger Christian
Valpreda Emanuele
Vemparala Manoj Rohit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

AnaCoNGA: Analytical HW-CNN Co-Design Using Nested Genetic Algorithms

Author: Becker Juergen
Fasfous Nael
Frickenstein Alexander
Hufer Julian
Martina Maurizio
Nagaraja Naveen-Shankar
Salihu Driton
Singh Anmol
Stechele Walter
Valpreda Emanuele
Vemparala Manoj Rohit
Voegel Hans-Joerg
Vu Doan Nguyen Anh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

We present AnaCoNGA, an analytical co-design methodology, which enables two genetic algorithms to evaluate the fitness of design decisions on layer-wise quantization of a neural network and hardware (HW) resource allocation. We embed a hardware architecture search (HAS) algorithm into a quantization strategy search (QSS) algorithm to evaluate the hardware design Pareto-front of each considered quantization strategy. We harness the speed and flexibility of analytical HW-modeling to enable parallel HW-CNN co-design. With this approach, the QSS is focused on seeking high-accuracy quantization strategies which are guaranteed to have efficient hardware designs at the end of the search. Through AnaCoNGA, we improve the accuracy by 2.88 p.p. with respect to a uniform 2-bit ResNet20 on CIFAR-10, and achieve a 35% and 37% improvement in latency and DRAM accesses, while reducing LUT and BRAM resources by 9% and 59% respectively, when compared to a standard edge variant of the accelerator. The nested genetic algorithm formulation also reduces the search time by 51% compared to an equivalent, sequential co-design formulation

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)