Search CORE

15 research outputs found

Latent Field Discovery In Interacting Dynamical Systems With Neural Fields

Author: Bekkers Erik J.
Gavves Efstratios
Kofinas Miltiadis
Nagaraja Naveen Shankar
Publication venue
Publication date: 31/10/2023
Field of study

Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them. We theorize the presence of latent force fields, and propose neural fields to learn them. Since the observed dynamics constitute the net effect of local object interactions and global field effects, recently popularized equivariant networks are inapplicable, as they fail to capture global information. To address this, we propose to disentangle local object interactions -- which are

\mathrm{SE}(n)

equivariant and depend on relative states -- from external global field effects -- which depend on absolute states. We model interactions with equivariant graph networks, and combine them with neural fields in a novel graph network that integrates field forces. Our experiments show that we can accurately discover the underlying fields in charged particles settings, traffic scenes, and gravitational n-body problems, and effectively use them to learn the system and forecast future trajectories.Comment: NeurIPS 2023. https://github.com/mkofinas/aethe

arXiv.org e-Print Archive

HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology

Author: Alexander Frickenstein
Christian Unger
Emanuele Valpreda
Manfredi Camalleri
Manoj Rohit Vemparala
Maurizio Martina
Nael Fasfous
Naveen Shankar Nagaraja
Qi Zhao
Walter Stechele
Publication venue: European Design and Automation Association (EDAA)
Publication date: 01/01/2022
Field of study

Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HW-Flow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Accelerating and pruning CNNs for semantic segmentation on FPGA

Author: Fasfous Nael
Frickenstein Alexander
Frickenstein Lukas
Helms Domenik
Mitra Saptarshi
Mori Pierpaolo
Nagaraja Naveen-Shankar
Passerone Claudio
Sarkar Sreetama
Stechele Walter
Vemparala Manoj-Rohit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (−1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Author: Doan Nguyen Anh Vu
Fasfous Nael
Frickenstein Alexander
Martina Maurizio
Nagaraja Naveen Shankar
Salihu Driton
Stechele Walter
Unger Christian
Valpreda Emanuele
Vemparala Manoj Rohit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)