3,828 research outputs found

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    On-Device Deep Learning Inference for System-on-Chip (SoC) Architectures

    Get PDF
    As machine learning becomes ubiquitous, the need to deploy models on real-time, embedded systems will become increasingly critical. This is especially true for deep learning solutions, whose large models pose interesting challenges for target architectures at the “edge” that are resource-constrained. The realization of machine learning, and deep learning, is being driven by the availability of specialized hardware, such as system-on-chip solutions, which provide some alleviation of constraints. Equally important, however, are the operating systems that run on this hardware, and specifically the ability to leverage commercial real-time operating systems which, unlike general purpose operating systems such as Linux, can provide the low-latency, deterministic execution required for embedded, and potentially safety-critical, applications at the edge. Despite this, studies considering the integration of real-time operating systems, specialized hardware, and machine learning/deep learning algorithms remain limited. In particular, better mechanisms for real-time scheduling in the context of machine learning applications will prove to be critical as these technologies move to the edge. In order to address some of these challenges, we present a resource management framework designed to provide a dynamic on-device approach to the allocation and scheduling of limited resources in a real-time processing environment. These types of mechanisms are necessary to support the deterministic behavior required by the control components contained in the edge nodes. To validate the effectiveness of our approach, we applied rigorous schedulability analysis to a large set of randomly generated simulated task sets and then verified the most time critical applications, such as the control tasks which maintained low-latency deterministic behavior even during off-nominal conditions. The practicality of our scheduling framework was demonstrated by integrating it into a commercial real-time operating system (VxWorks) then running a typical deep learning image processing application to perform simple object detection. The results indicate that our proposed resource management framework can be leveraged to facilitate integration of machine learning algorithms with real-time operating systems and embedded platforms, including widely-used, industry-standard real-time operating systems

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oförĂ€nderlig tillvĂ€xt i prestandakrav hos processorer, började den flerkĂ€rniga processor-revolutionen för ungefĂ€r 20 Ă„r sedan. Denna revolution skedde till största del som en lösning till begrĂ€nsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kĂ€rniga processorer. Den flerkĂ€rniga processor-revolutionen medförde inte enbart utmaningen gĂ€llande parallellprogrammering, m.a.o. förmĂ„gan att utveckla mjukvara som anvĂ€nder sig av alla delelement i de flerkĂ€rniga processorerna, men ocksĂ„ utmaningen med programmering av heterogena plattformar. FrĂ„gestĂ€llningen ”pĂ„ vilken processorelement skall en viss berĂ€kning utföras?” Ă€r vĂ€l kĂ€nt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmĂ€nna berĂ€kningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkĂ€rniga processorer pĂ„ olika system-on-chip plattformar, Ă€r heterogena parallella plattformar idag omfattande inom mĂ„nga domĂ€ner, frĂ„n konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera berĂ€kningarna pĂ„ en passande hĂ„rdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process Ă€r mycket utmanande för heterogena flerkĂ€rniga arkitekturer, och kan medföra fördelar nĂ€r det gĂ€ller energieffektivitet. Det största problemet Ă€r att de olika valmöjligheterna i designrymden kan vĂ€xa exponentiellt. Enligt den nuvarande trenden som förespĂ„r ökad heterogeniska aspekter i processorerna Ă€r utmaningen att hitta den mest passande placeringen av berĂ€kningarna pĂ„ hĂ„rdvaran Ă€nnu en forskningsfrĂ„ga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemalĂ€ggaren i Linux operativsystemet Ă€r inkapabel att hitta en effektiv placering av berĂ€kningarna pĂ„ den underliggande hĂ„rdvaran. Det enda mĂ€tsĂ€ttet som anvĂ€nds Ă€r processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. AnvĂ€ndning av detta mĂ€tsĂ€tt vid resursallokering resulterar i slöseri med energi. Denna forskning hĂ€rstammar frĂ„n observationerna att dessa tillvĂ€gagĂ„ngssĂ€tt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmĂ€nna mĂ„let att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) pĂ„ heterogena flerkĂ€rniga hĂ„rdvaruplattformar. Ström-applikationer Ă€r numera mycket vanliga i mjukvarudomĂ€n. Deras distinkta karaktĂ€r Ă€r inlĂ€sning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sĂ€tt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pĂ„ energi- och prestandahantering

    Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence

    Get PDF
    IEEE Access Volume 3, 2015, Article number 7217798, Pages 1512-1530 Open Access Cognition-based networks: A new perspective on network optimization using learning and distributed intelligence (Article) Zorzi, M.a , Zanella, A.a, Testolin, A.b, De Filippo De Grazia, M.b, Zorzi, M.bc a Department of Information Engineering, University of Padua, Padua, Italy b Department of General Psychology, University of Padua, Padua, Italy c IRCCS San Camillo Foundation, Venice-Lido, Italy View additional affiliations View references (107) Abstract In response to the new challenges in the design and operation of communication networks, and taking inspiration from how living beings deal with complexity and scalability, in this paper we introduce an innovative system concept called COgnition-BAsed NETworkS (COBANETS). The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide learning, modeling, optimization, and data representation. Moreover, in COBANETS, we propose to combine this learning architecture with the emerging network virtualization paradigms, which make it possible to actuate automatic optimization and reconfiguration strategies at the system level, thus fully unleashing the potential of the learning approach. Compared with the past and current research efforts in this area, the technical approach outlined in this paper is deeply interdisciplinary and more comprehensive, calling for the synergic combination of expertise of computer scientists, communications and networking engineers, and cognitive scientists, with the ultimate aim of breaking new ground through a profound rethinking of how the modern understanding of cognition can be used in the management and optimization of telecommunication network
    • 

    corecore