63 research outputs found
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy. In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4x to 2.5x) and fully-connected layers (at least 1.3x for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis
Penetrating Shields: A Systematic Analysis of Memory Corruption Mitigations in the Spectre Era
This paper provides the first systematic analysis of a synergistic threat
model encompassing memory corruption vulnerabilities and microarchitectural
side-channel vulnerabilities. We study speculative shield bypass attacks that
leverage speculative execution attacks to leak secrets that are critical to the
security of memory corruption mitigations (i.e., the shields), and then use the
leaked secrets to bypass the mitigation mechanisms and successfully conduct
memory corruption exploits, such as control-flow hijacking. We start by
systematizing a taxonomy of the state-of-the-art memory corruption mitigations
focusing on hardware-software co-design solutions. The taxonomy helps us to
identify 10 likely vulnerable defense schemes out of 20 schemes that we
analyze. Next, we develop a graph-based model to analyze the 10 likely
vulnerable defenses and reason about possible countermeasures. Finally, we
present three proof-of-concept attacks targeting an already-deployed mitigation
mechanism and two state-of-the-art academic proposals.Comment: 14 page
Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision
Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is gaining popularity in several computer vision algorithms, a significant energy consumption difference exists compared to traditional hand-crafted approaches. In this paper, we provide an in-depth analysis of the computation, energy and accuracy trade-offs between learned features such as deep Convolutional Neural Networks (CNN) and hand-crafted features such as Histogram of Oriented Gradients (HOG). This analysis is supported by measurements from two chips that implement these algorithms. Our goal is to understand the source of the energy discrepancy between the two approaches and to provide insight about the potential areas where CNNs can be improved and eventually approach the energy-efficiency of HOG while maintaining its outstanding performance accuracy
Hardware for Machine Learning: Challenges and Opportunities
Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).United States. Defense Advanced Research Projects Agency (DARPA)Texas Instruments IncorporatedIntel Corporatio
Penerapan Model Pembelajaran Problem Based Learning (Pbl) untuk Meningkatkan Hasil Belajar Siswa pada Pokok Bahasan Kelarutan dan Hasil Kali Kelarutan di Kelas XI IPA SMA Negeri 1 Kampar
Research was aimed to improve study result of student on solubility and result times solubility subject has been doing in class XI science SMAN 1 Kampar. This research was a form of experiments research with design pretest-posttest. Sample of the research were student of class XI science 2 as experimental class and class XI science 3 as control class. The experimental class was applied learning model problem base learning while the control class using discussion method. Data were analized using t- test. Result from the data analysis showed t count > t table (1,6923 > 1,68). It means learning model problem based learning can increase study result of student on solubility and result times solubility subject in class XI science SMAN 1 Kampar with category of increase study results on the solubility and result times solubility in class XI science is high category
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
Over the past few years, the explosion in sparse tensor algebra workloads has
led to a corresponding rise in domain-specific accelerators to service them.
Due to the irregularity present in sparse tensors, these accelerators employ a
wide variety of novel solutions to achieve good performance. At the same time,
prior work on design-flexible sparse accelerator modeling does not express this
full range of design features, making it difficult to understand the impact of
each design choice and compare or extend the state-of-the-art.
To address this, we propose TeAAL: a language and compiler for the concise
and precise specification and evaluation of sparse tensor algebra
architectures. We use TeAAL to represent and evaluate four disparate
state-of-the-art accelerators--ExTensor, Gamma, OuterSPACE, and SIGMA--and
verify that it reproduces their performance with high accuracy. Finally, we
demonstrate the potential of TeAAL as a tool for designing new accelerators by
showing how it can be used to speed up Graphicionado--by on BFS and
on SSSP.Comment: 14 pages, 12 figure
CAMP: A technique to estimate per-structure power at run-time using a few simple parameters
Microprocessor power has become a first-order constraint at run-time. Designers must employ aggressive power-management techniques at run-time to keep a processor’s ballooning power requirements under control. Effective power management benefits from knowledge of run-time microprocessor power consumption in both the core and individual microarchitectural structures, such as caches, queues, and execution units. Increasingly feasible per-structure power-control techniques, such as fine-grain clock gat-ing, power gating, and dynamic voltage/frequency scaling (DVFS), become more effective from run-time estimates of per-structure power. However, run-time computation of per-structure power esti-mates based on utilization requires daunting numbers of input sta-tistics, which makes per-structure monitoring of run-time power a challenging problem
- …