71 research outputs found

    DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

    Full text link
    We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture---9 layers, 27 million connections and 250K parameters---and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence

    Get PDF
    Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems

    ADBench: benchmarking autonomous driving systems

    Get PDF
    Driven by the improvements in a variety of domains, autonomous driving is becoming a reality and today, industry aims at moving toward fully autonomous vehicles. High-tech chip manufacturers are designing high-performance and energy-efficient platforms in accordance with safety standard requirements. However, the software used to implement advanced functionalities in autonomous vehicles challenges real-time constraints on those platforms. Hence, there is a clear need for industry-level autonomous driving benchmarks to evaluate platforms and systems. In this paper, we propose ADBench, a benchmarking approach and benchmark suite for state-of-the-art autonomous driving platforms, in accordance with the key modules, structural design and functions of AD systems, building on several industry-level autonomous driving systems. The use of standard benchmarks facilitates the design, verification and validation process of autonomous systems.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Grant TIN2015-65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 772773), and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oföränderlig tillväxt i prestandakrav hos processorer, började den flerkärniga processor-revolutionen för ungefär 20 år sedan. Denna revolution skedde till största del som en lösning till begränsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kärniga processorer. Den flerkärniga processor-revolutionen medförde inte enbart utmaningen gällande parallellprogrammering, m.a.o. förmågan att utveckla mjukvara som använder sig av alla delelement i de flerkärniga processorerna, men också utmaningen med programmering av heterogena plattformar. Frågeställningen ”på vilken processorelement skall en viss beräkning utföras?” är väl känt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmänna beräkningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkärniga processorer på olika system-on-chip plattformar, är heterogena parallella plattformar idag omfattande inom många domäner, från konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera beräkningarna på en passande hårdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process är mycket utmanande för heterogena flerkärniga arkitekturer, och kan medföra fördelar när det gäller energieffektivitet. Det största problemet är att de olika valmöjligheterna i designrymden kan växa exponentiellt. Enligt den nuvarande trenden som förespår ökad heterogeniska aspekter i processorerna är utmaningen att hitta den mest passande placeringen av beräkningarna på hårdvaran ännu en forskningsfråga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemaläggaren i Linux operativsystemet är inkapabel att hitta en effektiv placering av beräkningarna på den underliggande hårdvaran. Det enda mätsättet som används är processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. Användning av detta mätsätt vid resursallokering resulterar i slöseri med energi. Denna forskning härstammar från observationerna att dessa tillvägagångssätt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmänna målet att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) på heterogena flerkärniga hårdvaruplattformar. Ström-applikationer är numera mycket vanliga i mjukvarudomän. Deras distinkta karaktär är inläsning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sätt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus på energi- och prestandahantering

    Envisioning a Safety Island to Enable HPC Devices in Safety-Critical Domains

    Full text link
    HPC (High Performance Computing) devices increasingly become the only alternative to deliver the performance needed in safety-critical autonomous systems (e.g., autonomous cars, unmanned planes) due to deploying large and powerful multicores along with accelerators such as GPUs. However, the support that those HPC devices offer to realize safety-critical systems on top is heterogeneous. Safety islands have been devised to be coupled to HPC devices and complement them to meet the safety requirements of an increased set of applications, yet the variety of concepts and realizations is large. This paper presents our own concept of a safety island with two goals in mind: (1) offering a wide set of features to enable the broadest set of safety applications for each HPC device, and (2) being realized with open source components based on RISC-V ISA to ease its use and adoption. In particular, we present our safety island concept, the key features we foresee it should include, and its potential application beyond safety.Comment: White pape

    Design and management of image processing pipelines within CPS : Acquired experience towards the end of the FitOptiVis ECSEL Project

    Get PDF
    Cyber-Physical Systems (CPSs) are dynamic and reactive systems interacting with processes, environment and, sometimes, humans. They are often distributed with sensors and actuators, characterized for being smart, adaptive, predictive and react in real-time. Indeed, image- and video-processing pipelines are a prime source for environmental information for systems allowing them to take better decisions according to what they see. Therefore, in FitOptiVis, we are developing novel methods and tools to integrate complex image- and video-processing pipelines. FitOptiVis aims to deliver a reference architecture for describing and optimizing quality and resource management for imaging and video pipelines in CPSs both at design- and run-time. The architecture is concretized in low-power, high-performance, smart components, and in methods and tools for combined design-time and run-time multi-objective optimization and adaptation within system and environment constraints.Peer reviewe
    corecore