8 research outputs found
Searching Toward Pareto-Optimal Device-Aware Neural Architectures
Recent breakthroughs in Neural Architectural Search (NAS) have achieved
state-of-the-art performance in many tasks such as image classification and
language understanding. However, most existing works only optimize for model
accuracy and largely ignore other important factors imposed by the underlying
hardware and devices, such as latency and energy, when making inference. In
this paper, we first introduce the problem of NAS and provide a survey on
recent works. Then we deep dive into two recent advancements on extending NAS
into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net
are capable of optimizing accuracy and other objectives imposed by devices,
searching for neural architectures that can be best deployed on a wide spectrum
of devices: from embedded systems and mobile devices to workstations.
Experimental results are poised to show that architectures found by MONAS and
DPP-Net achieves Pareto optimality w.r.t the given objectives for various
devices.Comment: ICCAD'18 Invited Pape
Optimality Assessment of Memory-Bounded ConvNets Deployed on Resource-Constrained RISC Cores
A cost-effective implementation of Convolutional Neural Nets on the mobile edge of the Internet-of-Things (IoT) requires smart optimizations to fit large models into memory-constrained cores. Reduction methods that use a joint combination of filter pruning and weight quantization have proven efficient in searching the compression that ensures minimum model size without accuracy loss. However, there exist other optimal configurations that stem from the memory constraint. The objective of this work is to make an assessment of such memory-bounded implementations and to show that most of them are centred on specific parameter settings that are found difficult to be implemented on a low-power RISC. Hence, the focus is on quantifying the distance to optimality of the closest implementations that instead can be actually deployed on hardware. The analysis is powered by a two-stage framework that efficiently explores the memory-accuracy space using a lightweight, hardware-conscious heuristic optimization. Results are collected from three realistic IoT tasks (Image Classification on CIFAR-10, Keyword Spotting on the Speech Commands Dataset, Facial Expression Recognition on Fer2013) run on RISC cores (Cortex-M by ARM) with few hundreds KB of on-chip RAM