47,306 research outputs found

    Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device

    Get PDF
    Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processingrelated products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products

    An Experimental Nexos Laboratory Using Virtual Xinu

    Get PDF
    The Nexos Project is a joint effort between Marquette University, the University of Buffalo, and the University of Mississippi to build curriculum materials and a supporting experimental laboratory for hands-on projects in computer systems courses. The approach focuses on inexpensive, flexible, commodity embedded hardware, freely available development and debugging tools, and a fresh implementation of a classic operating system, Embedded Xinu, that is ideal for student exploration. This paper describes an extension to the Nexos laboratory that includes a new target platform composed of Qemu virtual machines. Virtual Xinu addresses two challenges that limit the effectiveness of Nexos. First, potential faculty adopters have clearly indicated that even with the current minimal monetary cost of installation, the hardware modifications, and time investment remain troublesome factors that scare off interested educators. Second, overcoming the inherent complications that arise due to the shared subnet that result in students\u27 projects interfering with each other in ways that are difficult to recreate, debug, and understand. Specifically, this paper discusses porting the Xinu operating systems to Qemu virtual hardware, developing the virtual networking platform, and results showing success using Virtual Xinu in the classroom during one semester of Operating Systems at the University of Mississippi

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Multi-band sub-GHz technology recognition on NVIDIA’s Jetson Nano

    Get PDF
    Low power wide area networks support the success of long range Internet of things applications such as agriculture, security, smart cities and homes. This enormous popularity, however, breeds new challenging problems as the wireless spectrum gets saturated which increases the probability of collisions and performance degradation. To this end, smart spectrum decisions are needed and will be supported by wireless technology recognition to allow the networks to dynamically adapt to the ever changing environment where fair co-existence with other wireless technologies becomes essential. In contrast to existing research that assesses technology recognition using machine learning on powerful graphics processing units, this work aims to propose a deep learning solution using convolutional neural networks, cheap software defined radios and efficient embedded platforms such as NVIDIA’s Jetson Nano. More specifically, this paper presents low complexity near-real time multi-band sub-GHz technology recognition and supports a wide variety of technologies using multiple settings. Results show accuracies around 99%, which are comparable with state of the art solutions, while the classification time on a NVIDIA Jetson Nano remains small and offers real-time execution. These results will enable smart spectrum management without the need of expensive and high power consuming hardware

    OPEB: Open Physical Environment Benchmark for Artificial Intelligence

    Full text link
    Artificial Intelligence methods to solve continuous- control tasks have made significant progress in recent years. However, these algorithms have important limitations and still need significant improvement to be used in industry and real- world applications. This means that this area is still in an active research phase. To involve a large number of research groups, standard benchmarks are needed to evaluate and compare proposed algorithms. In this paper, we propose a physical environment benchmark framework to facilitate collaborative research in this area by enabling different research groups to integrate their designed benchmarks in a unified cloud-based repository and also share their actual implemented benchmarks via the cloud. We demonstrate the proposed framework using an actual implementation of the classical mountain-car example and present the results obtained using a Reinforcement Learning algorithm.Comment: Accepted in 3rd IEEE International Forum on Research and Technologies for Society and Industry 201

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices

    Full text link
    In recent years, advances in deep learning have resulted in unprecedented leaps in diverse tasks spanning from speech and object recognition to context awareness and health monitoring. As a result, an increasing number of AI-enabled applications are being developed targeting ubiquitous and mobile devices. While deep neural networks (DNNs) are getting bigger and more complex, they also impose a heavy computational and energy burden on the host devices, which has led to the integration of various specialized processors in commodity devices. Given the broad range of competing DNN architectures and the heterogeneity of the target hardware, there is an emerging need to understand the compatibility between DNN-platform pairs and the expected performance benefits on each platform. This work attempts to demystify this landscape by systematically evaluating a collection of state-of-the-art DNNs on a wide variety of commodity devices. In this respect, we identify potential bottlenecks in each architecture and provide important guidelines that can assist the community in the co-design of more efficient DNNs and accelerators.Comment: Accepted at MobiSys 2019: 3rd International Workshop on Embedded and Mobile Deep Learning (EMDL), 201
    • …
    corecore