17 research outputs found
A Context-Switching/Dual-Context ROM Augmented RAM using Standard 8T SRAM
The landscape of emerging applications has been continually widening,
encompassing various data-intensive applications like artificial intelligence,
machine learning, secure encryption, Internet-of-Things, etc. A sustainable
approach toward creating dedicated hardware platforms that can cater to
multiple applications often requires the underlying hardware to context-switch
or support more than one context simultaneously. This paper presents a
context-switching and dual-context memory based on the standard 8T SRAM
bit-cell. Specifically, we exploit the availability of multi-VT transistors by
selectively choosing the read-port transistors of the 8T SRAM cell to be either
high-VT or low-VT. The 8T SRAM cell is thus augmented to store ROM data
(represented as the VT of the transistors constituting the read-port) while
simultaneously storing RAM data. Further, we propose specific sensing
methodologies such that the memory array can support RAM-only or ROM-only mode
(context-switching (CS) mode) or RAM and ROM mode simultaneously (dual-context
(DC) mode). Extensive Monte-Carlo simulations have verified the robustness of
our proposed ROM-augmented CS/DC memory on the Globalfoundries 22nm-FDX
technology node
Neuromorphic-P2M: Processing-in-Pixel-in-Memory Paradigm for Neuromorphic Image Sensors
Edge devices equipped with computer vision must deal with vast amounts of
sensory data with limited computing resources. Hence, researchers have been
exploring different energy-efficient solutions such as near-sensor processing,
in-sensor processing, and in-pixel processing, bringing the computation closer
to the sensor. In particular, in-pixel processing embeds the computation
capabilities inside the pixel array and achieves high energy efficiency by
generating low-level features instead of the raw data stream from CMOS image
sensors. Many different in-pixel processing techniques and approaches have been
demonstrated on conventional frame-based CMOS imagers, however, the
processing-in-pixel approach for neuromorphic vision sensors has not been
explored so far. In this work, we for the first time, propose an asynchronous
non-von-Neumann analog processing-in-pixel paradigm to perform convolution
operations by integrating in-situ multi-bit multi-channel convolution inside
the pixel array performing analog multiply and accumulate (MAC) operations that
consume significantly less energy than their digital MAC alternative. To make
this approach viable, we incorporate the circuit's non-ideality, leakage, and
process variations into a novel hardware-algorithm co-design framework that
leverages extensive HSpice simulations of our proposed circuit using the GF22nm
FD-SOI technology node. We verified our framework on state-of-the-art
neuromorphic vision sensor datasets and show that our solution consumes ~2x
lower backend-processor energy while maintaining almost similar front-end
(sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art
while maintaining a high test accuracy of 88.36%.Comment: 17 pages, 11 figures, 2 table
Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)
The massive amounts of data generated by camera sensors motivate data
processing inside pixel arrays, i.e., at the extreme-edge. Several critical
developments have fueled recent interest in the processing-in-pixel-in-memory
paradigm for a wide range of visual machine intelligence tasks, including (1)
advances in 3D integration technology to enable complex processing inside each
pixel in a 3D integrated manner while maintaining pixel density, (2) analog
processing circuit techniques for massively parallel low-energy in-pixel
computations, and (3) algorithmic techniques to mitigate non-idealities
associated with analog processing through hardware-aware training schemes. This
article presents a comprehensive technology-circuit-algorithm landscape that
connects technology capabilities, circuit design strategies, and algorithmic
optimizations to power, performance, area, bandwidth reduction, and
application-level accuracy metrics. We present our results using a
comprehensive co-design framework incorporating hardware and algorithmic
optimizations for various complex real-life visual intelligence tasks mapped
onto our P2M paradigm
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Secure computation is of critical importance to not only the DoD, but across
financial institutions, healthcare, and anywhere personally identifiable
information (PII) is accessed. Traditional security techniques require data to
be decrypted before performing any computation. When processed on untrusted
systems the decrypted data is vulnerable to attacks to extract the sensitive
information. To address these vulnerabilities Fully Homomorphic Encryption
(FHE) keeps the data encrypted during computation and secures the results, even
in these untrusted environments. However, FHE requires a significant amount of
computation to perform equivalent unencrypted operations. To be useful, FHE
must significantly close the computation gap (within 10x) to make encrypted
processing practical. To accomplish this ambitious goal the TREBUCHET project
is leading research and development in FHE processing hardware to accelerate
deep computations on encrypted data, as part of the DARPA MTO Data Privacy for
Virtual Environments (DPRIVE) program. We accelerate the major secure
standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security
while integrating with the open-source PALISADE and OpenFHE libraries currently
used in the DoD and in industry. We utilize a novel tile-based chip design with
highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The
TREBUCHET coprocessor design provides a highly modular, flexible, and
extensible FHE accelerator for easy reconfiguration, deployment, integration
and application on other hardware form factors, such as System-on-Chip or
alternate chip areas.Comment: 6 pages, 5figures, 2 table
Single crystalline Ge1.xMnx nanowires as building blocks for nanoelectronics
Magnetically doped Si and Ge nanowires have potential application in future nanowire spin-based devices. Here, we report a supercritical fluid method for producing single crystalline Mn-doped Ge nanowires with a Mn-doping concentration of between 0.5−1.0 atomic % that display ferromagnetism above 300 K and a superior performance with respect to the hole mobility of around 340 cm2/Vs, demonstrating the potential of using these nanowires as building blocks for electronic devices
Neuromorphic-P2M: processing-in-pixel-in-memory paradigm for neuromorphic image sensors
Edge devices equipped with computer vision must deal with vast amounts of sensory data with limited computing resources. Hence, researchers have been exploring different energy-efficient solutions such as near-sensor, in-sensor, and in-pixel processing, bringing the computation closer to the sensor. In particular, in-pixel processing embeds the computation capabilities inside the pixel array and achieves high energy efficiency by generating low-level features instead of the raw data stream from CMOS image sensors. Many different in-pixel processing techniques and approaches have been demonstrated on conventional frame-based CMOS imagers; however, the processing-in-pixel approach for neuromorphic vision sensors has not been explored so far. In this work, for the first time, we propose an asynchronous non-von-Neumann analog processing-in-pixel paradigm to perform convolution operations by integrating in-situ multi-bit multi-channel convolution inside the pixel array performing analog multiply and accumulate (MAC) operations that consume significantly less energy than their digital MAC alternative. To make this approach viable, we incorporate the circuit's non-ideality, leakage, and process variations into a novel hardware-algorithm co-design framework that leverages extensive HSpice simulations of our proposed circuit using the GF22nm FD-SOI technology node. We verified our framework on state-of-the-art neuromorphic vision sensor datasets and show that our solution consumes ~2× lower backend-processor energy while maintaining almost similar front-end (sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art while maintaining a high test accuracy of 88.36%
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted during computation and secures the results, even in these untrusted environments. However, FHE requires a significant amount of computation to perform equivalent unencrypted operations. To be useful, FHE must significantly close the computation gap (within 10x) to make encrypted processing practical.
To accomplish this ambitious goal the TREBUCHET project is leading research and development in FHE processing hardware to accelerate deep computations on encrypted data, as part of the DARPA MTO Data Privacy for Virtual Environments (DPRIVE) program. We accelerate the major secure standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security while integrating with the open-source PALISADE and OpenFHE libraries currently used in the DoD and in industry. We utilize a novel tile-based chip design with highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The TREBUCHET coprocessor design provides a highly modular, flexible, and extensible FHE accelerator for easy reconfiguration, deployment, integration and application on other hardware form factors, such as System-on-Chip or alternate chip area