21 research outputs found
QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments
Pipeline parallelism has achieved great success in deploying large-scale
transformer models in cloud environments, but has received less attention in
edge environments. Unlike in cloud scenarios with high-speed and stable network
interconnects, dynamic bandwidth in edge systems can degrade distributed
pipeline performance. We address this issue with QuantPipe, a
communication-efficient distributed edge system that introduces post-training
quantization (PTQ) to compress the communicated tensors. QuantPipe uses
adaptive PTQ to change bitwidths in response to bandwidth dynamics, maintaining
transformer pipeline performance while incurring limited inference accuracy
loss. We further improve the accuracy with a directed-search analytical
clipping for integer quantization method (DS-ACIQ), which bridges the gap
between estimated and real data distributions. Experimental results show that
QuantPipe adapts to dynamic bandwidth to maintain pipeline performance while
achieving a practical model accuracy using a wide range of quantization
bitwidths, e.g., improving accuracy under 2-bit quantization by 15.85\% on
ImageNet compared to naive quantization
Neuromorphic-P2M: Processing-in-Pixel-in-Memory Paradigm for Neuromorphic Image Sensors
Edge devices equipped with computer vision must deal with vast amounts of
sensory data with limited computing resources. Hence, researchers have been
exploring different energy-efficient solutions such as near-sensor processing,
in-sensor processing, and in-pixel processing, bringing the computation closer
to the sensor. In particular, in-pixel processing embeds the computation
capabilities inside the pixel array and achieves high energy efficiency by
generating low-level features instead of the raw data stream from CMOS image
sensors. Many different in-pixel processing techniques and approaches have been
demonstrated on conventional frame-based CMOS imagers, however, the
processing-in-pixel approach for neuromorphic vision sensors has not been
explored so far. In this work, we for the first time, propose an asynchronous
non-von-Neumann analog processing-in-pixel paradigm to perform convolution
operations by integrating in-situ multi-bit multi-channel convolution inside
the pixel array performing analog multiply and accumulate (MAC) operations that
consume significantly less energy than their digital MAC alternative. To make
this approach viable, we incorporate the circuit's non-ideality, leakage, and
process variations into a novel hardware-algorithm co-design framework that
leverages extensive HSpice simulations of our proposed circuit using the GF22nm
FD-SOI technology node. We verified our framework on state-of-the-art
neuromorphic vision sensor datasets and show that our solution consumes ~2x
lower backend-processor energy while maintaining almost similar front-end
(sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art
while maintaining a high test accuracy of 88.36%.Comment: 17 pages, 11 figures, 2 table
Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)
The massive amounts of data generated by camera sensors motivate data
processing inside pixel arrays, i.e., at the extreme-edge. Several critical
developments have fueled recent interest in the processing-in-pixel-in-memory
paradigm for a wide range of visual machine intelligence tasks, including (1)
advances in 3D integration technology to enable complex processing inside each
pixel in a 3D integrated manner while maintaining pixel density, (2) analog
processing circuit techniques for massively parallel low-energy in-pixel
computations, and (3) algorithmic techniques to mitigate non-idealities
associated with analog processing through hardware-aware training schemes. This
article presents a comprehensive technology-circuit-algorithm landscape that
connects technology capabilities, circuit design strategies, and algorithmic
optimizations to power, performance, area, bandwidth reduction, and
application-level accuracy metrics. We present our results using a
comprehensive co-design framework incorporating hardware and algorithmic
optimizations for various complex real-life visual intelligence tasks mapped
onto our P2M paradigm
Neuromorphic-P2M: processing-in-pixel-in-memory paradigm for neuromorphic image sensors
Edge devices equipped with computer vision must deal with vast amounts of sensory data with limited computing resources. Hence, researchers have been exploring different energy-efficient solutions such as near-sensor, in-sensor, and in-pixel processing, bringing the computation closer to the sensor. In particular, in-pixel processing embeds the computation capabilities inside the pixel array and achieves high energy efficiency by generating low-level features instead of the raw data stream from CMOS image sensors. Many different in-pixel processing techniques and approaches have been demonstrated on conventional frame-based CMOS imagers; however, the processing-in-pixel approach for neuromorphic vision sensors has not been explored so far. In this work, for the first time, we propose an asynchronous non-von-Neumann analog processing-in-pixel paradigm to perform convolution operations by integrating in-situ multi-bit multi-channel convolution inside the pixel array performing analog multiply and accumulate (MAC) operations that consume significantly less energy than their digital MAC alternative. To make this approach viable, we incorporate the circuit's non-ideality, leakage, and process variations into a novel hardware-algorithm co-design framework that leverages extensive HSpice simulations of our proposed circuit using the GF22nm FD-SOI technology node. We verified our framework on state-of-the-art neuromorphic vision sensor datasets and show that our solution consumes ~2× lower backend-processor energy while maintaining almost similar front-end (sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art while maintaining a high test accuracy of 88.36%
Effect of Steroid Therapy on Exercise Performance in Patients with Irreversible Chronic Obstructive Pulmonary Disease
A domain selection and evaluation framework for introducing knowledge-based systems in smaller businesses
P2M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking
Today's high resolution, high frame rate cameras in autonomous vehicles
generate a large volume of data that needs to be transferred and processed by a
downstream processor or machine learning (ML) accelerator to enable intelligent
computing tasks, such as multi-object detection and tracking. The massive
amount of data transfer incurs significant energy, latency, and bandwidth
bottlenecks, which hinders real-time processing. To mitigate this problem, we
propose an algorithm-hardware co-design framework called
Processing-in-Pixel-in-Memory-based object Detection and Tracking
(P2M-DeTrack). P2M-DeTrack is based on a custom faster R-CNN-based model that
is distributed partly inside the pixel array (front-end) and partly in a
separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing
down-samples the input feature maps significantly with judiciously optimized
strided convolution and pooling. Compared to a conventional baseline design
that transfers frames of RGB pixels to the back-end, the resulting P2M-DeTrack
designs reduce the data bandwidth between sensor and back-end by up to 24x. The
designs also reduce the sensor and total energy (obtained from in-house circuit
simulations at Globalfoundries 22nm technology node) per frame by 5.7x and
1.14x, respectively. Lastly, they reduce the sensing and total frame latency by
an estimated 1.7x and 3x, respectively. We evaluate our approach on the
multi-object object detection (tracking) task of the large-scale BDD100K
dataset and observe only a 0.5% reduction in the mean average precision (0.8%
reduction in the identification F1 score) compared to the state-of-the-art.Comment: 6 pages, 4 figures, 4 table