1,462 research outputs found
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge
Extreme edge devices or Internet-of-thing nodes require both ultra-low power
always-on processing as well as the ability to do on-demand sampling and
processing. Moreover, support for IoT applications like voice recognition,
machine monitoring, etc., requires the ability to execute a wide range of ML
workloads. This brings challenges in hardware design to build flexible
processors operating in ultra-low power regime. This paper presents TinyVers, a
tiny versatile ultra-low power ML system-on-chip to enable enhanced
intelligence at the Extreme Edge. TinyVers exploits dataflow reconfiguration to
enable multi-modal support and aggressive on-chip power management for
duty-cycling to enable smart sensing applications. The SoC combines a RISC-V
host processor, a 17 TOPS/W dataflow reconfigurable ML accelerator, a 1.7
W deep sleep wake-up controller, and an eMRAM for boot code and ML
parameter retention. The SoC can perform up to 17.6 GOPS while achieving a
power consumption range from 1.7 W-20 mW. Multiple ML workloads aimed for
diverse applications are mapped on the SoC to showcase its flexibility and
efficiency. All the models achieve 1-2 TOPS/W of energy efficiency with power
consumption below 230 W in continuous operation. In a duty-cycling use
case for machine monitoring, this power is reduced to below 10 W.Comment: Accepted in IEEE Journal of Solid-State Circuit
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node
Binary Neural Networks (BNNs) have been shown to be robust to random
bit-level noise, making aggressive voltage scaling attractive as a power-saving
technique for both logic and SRAMs. In this work, we introduce the first fully
programmable IoT end-node system-on-chip (SoC) capable of executing
software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC
exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented
by reliable standard-cell memories to safely store critical data under
aggressive voltage scaling. On a prototype in 22nm FDX technology, we
demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout
any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving
energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that
the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more
than99% of the nominal accuracy (with a bit error rate ~1/1000). In this
operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10
dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW
while keeping within a peak power envelope of 674uW - low enough to enable
always-on operation in ultra-low power smart cameras, long-lifetime
environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu
- …