932 research outputs found
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car
We present DeepPicar, a low-cost deep neural network based autonomous car
platform. DeepPicar is a small scale replication of a real self-driving car
called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN),
which takes images from a front-facing camera as input and produces car
steering angles as output. DeepPicar uses the same network architecture---9
layers, 27 million connections and 250K parameters---and can drive itself in
real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using
DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end
deep learning based real-time control of autonomous vehicles. We also
systematically compare other contemporary embedded computing platforms using
the DeepPicar's CNN-based real-time control workload. We find that all tested
platforms, including the Pi 3, are capable of supporting the CNN-based
real-time control, from 20 Hz up to 100 Hz, depending on hardware platform.
However, we find that shared resource contention remains an important issue
that must be considered in applying CNN models on shared memory based embedded
computing platforms; we observe up to 11.6X execution time increase in the CNN
based control loop due to shared resource contention. To protect the CNN
workload, we also evaluate state-of-the-art cache partitioning and memory
bandwidth throttling techniques on the Pi 3. We find that cache partitioning is
ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201
Space Generic Open Avionics Architecture (SGOAA) reference model technical guide
This report presents a full description of the Space Generic Open Avionics Architecture (SGOAA). The SGOAA consists of a generic system architecture for the entities in spacecraft avionics, a generic processing architecture, and a six class model of interfaces in a hardware/software system. The purpose of the SGOAA is to provide an umbrella set of requirements for applying the generic architecture interface model to the design of specific avionics hardware/software systems. The SGOAA defines a generic set of system interface points to facilitate identification of critical interfaces and establishes the requirements for applying appropriate low level detailed implementation standards to those interface points. The generic core avionics system and processing architecture models provided herein are robustly tailorable to specific system applications and provide a platform upon which the interface model is to be applied
Security of Electrical, Optical and Wireless On-Chip Interconnects: A Survey
The advancement of manufacturing technologies has enabled the integration of
more intellectual property (IP) cores on the same system-on-chip (SoC).
Scalable and high throughput on-chip communication architecture has become a
vital component in today's SoCs. Diverse technologies such as electrical,
wireless, optical, and hybrid are available for on-chip communication with
different architectures supporting them. Security of the on-chip communication
is crucial because exploiting any vulnerability would be a goldmine for an
attacker. In this survey, we provide a comprehensive review of threat models,
attacks, and countermeasures over diverse on-chip communication technologies as
well as sophisticated architectures.Comment: 41 pages, 24 figures, 4 table
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
© 2016 IEEE. Breakthroughs from the field of deep learning are radically changing how sensor data are interpreted to extract the high-level information needed by mobile apps. It is critical that the gains in inference accuracy that deep models afford become embedded in future generations of mobile apps. In this work, we present the design and implementation of DeepX, a software accelerator for deep learning execution. DeepX signif- icantly lowers the device resources (viz. memory, computation, energy) required by deep learning that currently act as a severe bottleneck to mobile adoption. The foundation of DeepX is a pair of resource control algorithms, designed for the inference stage of deep learning, that: (1) decompose monolithic deep model network architectures into unit- blocks of various types, that are then more efficiently executed by heterogeneous local device processors (e.g., GPUs, CPUs); and (2), perform principled resource scaling that adjusts the architecture of deep models to shape the overhead each unit-blocks introduces. Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading
BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters
Serverless computing has gained popularity in edge computing due to its
flexible features, including the pay-per-use pricing model, auto-scaling
capabilities, and multi-tenancy support. Complex Serverless-based applications
typically rely on Serverless workflows (also known as Serverless function
orchestration) to express task execution logic, and numerous application- and
system-level optimization techniques have been developed for Serverless
workflow scheduling. However, there has been limited exploration of optimizing
Serverless workflow scheduling in edge computing systems, particularly in
high-density, resource-constrained environments such as system-on-chip clusters
and single-board-computer clusters. In this work, we discover that existing
Serverless workflow scheduling techniques typically assume models with limited
expressiveness and cause significant resource contention. To address these
issues, we propose modeling Serverless workflows using behavior trees, a novel
and fundamentally different approach from existing directed-acyclic-graph- and
state machine-based models. Behavior tree-based modeling allows for easy
analysis without compromising workflow expressiveness. We further present
observations derived from the inherent tree structure of behavior trees for
contention-free function collections and awareness of exact and empirical
concurrent function invocations. Based on these observations, we introduce
BeeFlow, a behavior tree-based Serverless workflow system tailored for
resource-constrained edge clusters. Experimental results demonstrate that
BeeFlow achieves up to 3.2X speedup in a high-density, resource-constrained
edge testbed and 2.5X speedup in a high-profile cloud testbed, compared with
the state-of-the-art.Comment: Accepted by Journal of Systems Architectur
Test Scheduling of SoC by using Dynamic Voltage Frequency Scaling (DVFS) Technique
High temperature gradients in System on Chip (SoC) lowered the performances, reliability and leakage power. In addition, temperature during testing gain more compared to normal operation. Therefore, the investigation of the impact dynamic voltage frequency scaling (DVFS) on the thermal aware test scheduling performance will be the main contribution of this work. The test scheduling algorithm which embeds frequency scaling effect with dynamic voltage supply is tested on ITC’02 benchmark. The formulation of ILP is to minimize the group of the test session in SoC and continued with DVFS formulation. Compared to the conventional thermal-aware scheduling approach based purely on a frequency scaling, this technique provides shorter overall test times and greatly improved flexibility to satisfy strict thermal constraints. The proposed DVFS with thermal aware task scheduling allows to minimize test time more than 46%
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- …