68 research outputs found

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Running deep learning applications on resource constrained devices

    Get PDF
    The high accuracy of Deep Neural Networks (DNN) come at the expense of high computational cost and memory requirements. During inference, the data is often collected on the edge device which are resource-constrained. The existing solutions for edge deployment include i) executing the entire DNN on the edge (EDGE-ONLY), ii) sending the input from edge to cloud where the DNN is processed (CLOUD-ONLY), and iii) splitting the DNN to execute partially on the edge and partially on the cloud (SPLIT). The choice of deployment between EDGE-ONLY, CLOUD-ONLY and SPLIT is determined by several operating constraints such as device resources and network speed, and application constraints such as latency and accuracy. The EDGE-ONLY approach requires compact DNN with low compute and memory requirements. Thus, the emerging class of DNNs employ low-rank convolutions (LRCONVs) which reduce one or more dimensions compared to the spatial convolutions (CONV). Prior research in hardware accelerators has largely focused on CONVs. The LRCONVs such as depthwise and pointwise convolutions exhibit lower arithmetic intensity and lower data reuse. Thus, LRCONVs result in low hardware utilization and high latency. In our first work, we systematically explore the design space of Cross-layer dataflows to exploit data reuse across layers for emerging DNNs in EDGE-ONLY scenarios. We develop novel fine-grain cross-layer dataflows for LRCONVs that support partial loop dimension completion. Our tool, X-Layer decouples the nested loops in a pipeline and combines them to create a common outer dataflow and several inner dataflows. The CLOUD-ONLY approach can suffer from high latency due to the high transmission cost of large input data from the edge to the cloud. This could be a problem, especially for latency-critical applications. Thankfully, the SPLIT approach reduces latency compared to the CLOUD-ONLY approach. However, existing solutions only split the DNN in floating-point precision. Executing floating-point precision on the edge device can occupy large memory and reduce the potential options for SPLIT solutions. In our second work, we expand and explore the search space of SPLIT solutions by jointly applying mixed-precision post-training quantization and DNN graph split. Our work, Auto-Split finds a balance in the trade-off among the model accuracy, edge device capacity, transmission cost, and the overall latency

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Architecture FPGA améliorée et flot de conception pour une reconfiguration matérielle en ligne efficace

    Get PDF
    The self-reconfiguration capabilities of modern FPGA architectures pave the way for dynamic applications able to adapt to transient events. The CAD flows of modern architectures are nowadays mature but limited by the constraints induced by the complexity of FPGA circuits. In this thesis, multiple contributions are developed to propose an FPGA architecture supporting the dynamic placement of hardware tasks. First, an intermediate representation of these tasks configuration data, independent from their final position, is presented. This representation allows to compress the task data up to 11x with regard to its conventional raw counterpart. An accompanying CAD flow, based on state-of-the-art tools, is proposed to generate relocatable tasks from a high-level description. Then, the online behavior of this mechanism is studied. Two algorithms allowing to decode and create in real-time the conventional bit-stream are described. In addition, an enhancement of the FPGA interconnection network is proposedto increase the placement flexibility of heterogeneous tasks, at the cost of a 10% increase in average of the critical path delay. Eventually, a configurable substitute to the configuration memory found in FPGAs is studied to ease their partial reconfiguration.Les capacités d'auto-reconfiguration des architectures FPGA modernes ouvrent la voie à des applications dynamiques capables d'adapter leur fonctionnement pour répondre à des évÚnements ponctuels. Les flots de reconfiguration des architectures commerciales sont aujourd'hui aboutis mais limités par des contraintes inhérentes à la complexité de ces circuits. Dans cette thÚse, plusieurs contributions sont avancées afin de proposer une architecture FPGA reconfigurable permettant le placement dynamique de tùches matérielles. Dans un premier temps, une représentation intermédiaire des données de configuration de ces tùches, indépendante de leur positionnement final, est présentée. Cette représentation permet notamment d'atteindre des taux de compression allant jusqu'à 11x par rapport à la représentation brute d'une tùche. Un flot de conception basé sur des outils de l'état de l'art accompagne cette représentation et génÚre des tùches relogeables à partir d'une description haut-niveau. Ensuite, le comportement en ligne de ce mécanisme est étudié. Deux algorithmes permettant le décodage de ces tùches et la génération en temps-réel des données de configuration propres à l'architectures son décrits. Par ailleurs, une amélioration du réseau d'interconnexion d'une architecture FPGA est proposée pour accroßtre la flexibilité du placement de tùches hétérogÚnes, avec une augmentation de 10% en moyenne du délai du chemin critique. Enfin, une alternative programmable aux mémoires de configuration de ces circuits est étudiée pour faciliter leur reconfiguration partielle

    Semiannual report, 1 October 1990 - 31 March 1991

    Get PDF
    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Internet of Things. Information Processing in an Increasingly Connected World

    Get PDF
    This open access book constitutes the refereed post-conference proceedings of the First IFIP International Cross-Domain Conference on Internet of Things, IFIPIoT 2018, held at the 24th IFIP World Computer Congress, WCC 2018, in Poznan, Poland, in September 2018. The 12 full papers presented were carefully reviewed and selected from 24 submissions. Also included in this volume are 4 WCC 2018 plenary contributions, an invited talk and a position paper from the IFIP domain committee on IoT. The papers cover a wide range of topics from a technology to a business perspective and include among others hardware, software and management aspects, process innovation, privacy, power consumption, architecture, applications

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

    Dynamically reconfigurable bio-inspired hardware

    Get PDF
    During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis
    • 

    corecore