Search CORE

21 research outputs found

QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments

Author: Beerel Peter A.
Crago Stephen P.
Imes Connor
Kundu Souvik
Walters John Paul
Wang Haonan
Publication venue
Publication date: 08/11/2022
Field of study

Pipeline parallelism has achieved great success in deploying large-scale transformer models in cloud environments, but has received less attention in edge environments. Unlike in cloud scenarios with high-speed and stable network interconnects, dynamic bandwidth in edge systems can degrade distributed pipeline performance. We address this issue with QuantPipe, a communication-efficient distributed edge system that introduces post-training quantization (PTQ) to compress the communicated tensors. QuantPipe uses adaptive PTQ to change bitwidths in response to bandwidth dynamics, maintaining transformer pipeline performance while incurring limited inference accuracy loss. We further improve the accuracy with a directed-search analytical clipping for integer quantization method (DS-ACIQ), which bridges the gap between estimated and real data distributions. Experimental results show that QuantPipe adapts to dynamic bandwidth to maintain pipeline performance while achieving a practical model accuracy using a wide range of quantization bitwidths, e.g., improving accuracy under 2-bit quantization by 15.85\% on ImageNet compared to naive quantization

arXiv.org e-Print Archive

Neuromorphic-P2M: Processing-in-Pixel-in-Memory Paradigm for Neuromorphic Image Sensors

Author: Beerel Peter A.
Datta Gourav
Jacob Ajey P.
Jaiswal Akhilesh R.
Kaiser Md Abdullah-Al
Wang Zixu
Publication venue
Publication date: 22/01/2023
Field of study

Edge devices equipped with computer vision must deal with vast amounts of sensory data with limited computing resources. Hence, researchers have been exploring different energy-efficient solutions such as near-sensor processing, in-sensor processing, and in-pixel processing, bringing the computation closer to the sensor. In particular, in-pixel processing embeds the computation capabilities inside the pixel array and achieves high energy efficiency by generating low-level features instead of the raw data stream from CMOS image sensors. Many different in-pixel processing techniques and approaches have been demonstrated on conventional frame-based CMOS imagers, however, the processing-in-pixel approach for neuromorphic vision sensors has not been explored so far. In this work, we for the first time, propose an asynchronous non-von-Neumann analog processing-in-pixel paradigm to perform convolution operations by integrating in-situ multi-bit multi-channel convolution inside the pixel array performing analog multiply and accumulate (MAC) operations that consume significantly less energy than their digital MAC alternative. To make this approach viable, we incorporate the circuit's non-ideality, leakage, and process variations into a novel hardware-algorithm co-design framework that leverages extensive HSpice simulations of our proposed circuit using the GF22nm FD-SOI technology node. We verified our framework on state-of-the-art neuromorphic vision sensor datasets and show that our solution consumes ~2x lower backend-processor energy while maintaining almost similar front-end (sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art while maintaining a high test accuracy of 88.36%.Comment: 17 pages, 11 figures, 2 table

arXiv.org e-Print Archive

Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)

Author: Beerel Peter A.
Datta Gourav
Garg Manas
Jacob Ajey P.
Jaiswal Akhilesh R.
Kaiser Md Abdullah-Al
Kundu Souvik
Sarkar Sreetama
Yin Zihan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/04/2023
Field of study

The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integrated manner while maintaining pixel density, (2) analog processing circuit techniques for massively parallel low-energy in-pixel computations, and (3) algorithmic techniques to mitigate non-idealities associated with analog processing through hardware-aware training schemes. This article presents a comprehensive technology-circuit-algorithm landscape that connects technology capabilities, circuit design strategies, and algorithmic optimizations to power, performance, area, bandwidth reduction, and application-level accuracy metrics. We present our results using a comprehensive co-design framework incorporating hardware and algorithmic optimizations for various complex real-life visual intelligence tasks mapped onto our P2M paradigm

arXiv.org e-Print Archive

Neuromorphic-P2M: processing-in-pixel-in-memory paradigm for neuromorphic image sensors

Author: Ajey P. Jacob
Akhilesh R. Jaiswal
Akhilesh R. Jaiswal
Gourav Datta
Md Abdullah-Al Kaiser
Md Abdullah-Al Kaiser
Peter A. Beerel
Peter A. Beerel
Zixu Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2023
Field of study

Edge devices equipped with computer vision must deal with vast amounts of sensory data with limited computing resources. Hence, researchers have been exploring different energy-efficient solutions such as near-sensor, in-sensor, and in-pixel processing, bringing the computation closer to the sensor. In particular, in-pixel processing embeds the computation capabilities inside the pixel array and achieves high energy efficiency by generating low-level features instead of the raw data stream from CMOS image sensors. Many different in-pixel processing techniques and approaches have been demonstrated on conventional frame-based CMOS imagers; however, the processing-in-pixel approach for neuromorphic vision sensors has not been explored so far. In this work, for the first time, we propose an asynchronous non-von-Neumann analog processing-in-pixel paradigm to perform convolution operations by integrating in-situ multi-bit multi-channel convolution inside the pixel array performing analog multiply and accumulate (MAC) operations that consume significantly less energy than their digital MAC alternative. To make this approach viable, we incorporate the circuit's non-ideality, leakage, and process variations into a novel hardware-algorithm co-design framework that leverages extensive HSpice simulations of our proposed circuit using the GF22nm FD-SOI technology node. We verified our framework on state-of-the-art neuromorphic vision sensor datasets and show that our solution consumes ~2× lower backend-processor energy while maintaining almost similar front-end (sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art while maintaining a high test accuracy of 88.36%

Directory of Open Access Journals

Automatic verification of timed circuits

Author: C. J. Myers
P. A. Beerel
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Effect of Steroid Therapy on Exercise Performance in Patients with Irreversible Chronic Obstructive Pulmonary Disease

Author: Beerel
Beerel
Colton
Daniel S. Strain
Darling
David P. Franco
Eaton
Evans
Freedman
Fuleihan
Gary T. Kinasewitz
Harding
Mendella
Morgan
Morris
Ogilvie
Oppenheimer
Ronald B. George
Shim
Sue
Publication venue: 'American College of Chest Physicians'
Publication date
Field of study

Crossref

A domain selection and evaluation framework for introducing knowledge-based systems in smaller businesses

Author: Beerel A.
Benchimol G.
Chow P.
Harmon P.
Harmon P.
Helton T.
Hui D.
Martinsons M.G.
Martinsons M.G.
Osins A.
Waterman D.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

2 ps resolution, fine‐grained delay element in 28 nm FDSOI

Author: Bowman K.
Chiang J.‐S.
Flatresse P.
Hand D.
Heck G.
Jasielski J.
Mahapatra N.
Maymandi‐Nejad M.
P. Beerel
R.N. Tadros
Singhvi A.
Tschanz J.
W. Hua
Yao C.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

Crossref

P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking

Author: Abd-Almageed Wael
Beerel Peter A.
Datta Gourav
Jacob Ajey P.
Jaiswal Akhilesh R.
Kundu Souvik
Lakkireddy Ravi T.
Liu Zeyu
Lu Shunlin
Mathai Joe
Schmidt Andrew
Tian Mulin
Wang Zixu
Yin Zihan
Publication venue
Publication date: 27/05/2022
Field of study

Today's high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which hinders real-time processing. To mitigate this problem, we propose an algorithm-hardware co-design framework called Processing-in-Pixel-in-Memory-based object Detection and Tracking (P2M-DeTrack). P2M-DeTrack is based on a custom faster R-CNN-based model that is distributed partly inside the pixel array (front-end) and partly in a separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing down-samples the input feature maps significantly with judiciously optimized strided convolution and pooling. Compared to a conventional baseline design that transfers frames of RGB pixels to the back-end, the resulting P2M-DeTrack designs reduce the data bandwidth between sensor and back-end by up to 24x. The designs also reduce the sensor and total energy (obtained from in-house circuit simulations at Globalfoundries 22nm technology node) per frame by 5.7x and 1.14x, respectively. Lastly, they reduce the sensing and total frame latency by an estimated 1.7x and 3x, respectively. We evaluate our approach on the multi-object object detection (tracking) task of the large-scale BDD100K dataset and observe only a 0.5% reduction in the mean average precision (0.8% reduction in the identification F1 score) compared to the state-of-the-art.Comment: 6 pages, 4 figures, 4 table

arXiv.org e-Print Archive