47,306 research outputs found
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
An Experimental Nexos Laboratory Using Virtual Xinu
The Nexos Project is a joint effort between Marquette University, the University of Buffalo, and the University of Mississippi to build curriculum materials and a supporting experimental laboratory for hands-on projects in computer systems courses. The approach focuses on inexpensive, flexible, commodity embedded hardware, freely available development and debugging tools, and a fresh implementation of a classic operating system, Embedded Xinu, that is ideal for student exploration. This paper describes an extension to the Nexos laboratory that includes a new target platform composed of Qemu virtual machines. Virtual Xinu addresses two challenges that limit the effectiveness of Nexos. First, potential faculty adopters have clearly indicated that even with the current minimal monetary cost of installation, the hardware modifications, and time investment remain troublesome factors that scare off interested educators. Second, overcoming the inherent complications that arise due to the shared subnet that result in students\u27 projects interfering with each other in ways that are difficult to recreate, debug, and understand. Specifically, this paper discusses porting the Xinu operating systems to Qemu virtual hardware, developing the virtual networking platform, and results showing success using Virtual Xinu in the classroom during one semester of Operating Systems at the University of Mississippi
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Union’s Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
Multi-band sub-GHz technology recognition on NVIDIA’s Jetson Nano
Low power wide area networks support the success of long range Internet of things applications such as agriculture, security, smart cities and homes. This enormous popularity, however, breeds new challenging problems as the wireless spectrum gets saturated which increases the probability of collisions and performance degradation. To this end, smart spectrum decisions are needed and will be supported by wireless technology recognition to allow the networks to dynamically adapt to the ever changing environment where fair co-existence with other wireless technologies becomes essential. In contrast to existing research that assesses technology recognition using machine learning on powerful graphics processing units, this work aims to propose a deep learning solution using convolutional neural networks, cheap software defined radios and efficient embedded platforms such as NVIDIA’s Jetson Nano. More specifically, this paper presents low complexity near-real time multi-band sub-GHz technology recognition and supports a wide variety of technologies using multiple settings. Results show accuracies around 99%, which are comparable with state of the art solutions, while the classification time on a NVIDIA Jetson Nano remains small and offers real-time execution. These results will enable smart spectrum management without the need of expensive and high power consuming hardware
OPEB: Open Physical Environment Benchmark for Artificial Intelligence
Artificial Intelligence methods to solve continuous- control tasks have made
significant progress in recent years. However, these algorithms have important
limitations and still need significant improvement to be used in industry and
real- world applications. This means that this area is still in an active
research phase. To involve a large number of research groups, standard
benchmarks are needed to evaluate and compare proposed algorithms. In this
paper, we propose a physical environment benchmark framework to facilitate
collaborative research in this area by enabling different research groups to
integrate their designed benchmarks in a unified cloud-based repository and
also share their actual implemented benchmarks via the cloud. We demonstrate
the proposed framework using an actual implementation of the classical
mountain-car example and present the results obtained using a Reinforcement
Learning algorithm.Comment: Accepted in 3rd IEEE International Forum on Research and Technologies
for Society and Industry 201
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices
In recent years, advances in deep learning have resulted in unprecedented
leaps in diverse tasks spanning from speech and object recognition to context
awareness and health monitoring. As a result, an increasing number of
AI-enabled applications are being developed targeting ubiquitous and mobile
devices. While deep neural networks (DNNs) are getting bigger and more complex,
they also impose a heavy computational and energy burden on the host devices,
which has led to the integration of various specialized processors in commodity
devices. Given the broad range of competing DNN architectures and the
heterogeneity of the target hardware, there is an emerging need to understand
the compatibility between DNN-platform pairs and the expected performance
benefits on each platform. This work attempts to demystify this landscape by
systematically evaluating a collection of state-of-the-art DNNs on a wide
variety of commodity devices. In this respect, we identify potential
bottlenecks in each architecture and provide important guidelines that can
assist the community in the co-design of more efficient DNNs and accelerators.Comment: Accepted at MobiSys 2019: 3rd International Workshop on Embedded and
Mobile Deep Learning (EMDL), 201
- …