1,957 research outputs found

    A methodology to implement real-time applications on reconfigurable circuits

    Get PDF
    Special Issue Engineering of Configurable SystemsInternational audienceThis paper presents an extension of our AAA rapid prototyping methodology for the optimized implementation ofreal-time applications onto reconfigurable circuits. This extension is based on an unified model of factorized datadependence graphs as well to specify the application algorihtm, as to deduce the possible implementations ontoreconfigurable hardware, in terms of graphs transformations. This transformation flow has been implemented inSynDEx, a system level CAD software tool

    XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

    Full text link
    Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

    Sensor selection for energy-efficient ambulatory medical monitoring

    Get PDF
    Epilepsy affects over three million Americans of all ages. Despite recent advances, more than 20% of individuals with epilepsy never achieve adequate control of their seizures. The use of a small, portable, non-invasive seizure monitor could benefit these individuals tremendously. However, in order for such a device to be suitable for long-term wear, it must be both comfortable and lightweight. Typical state-of-the-art non-invasive seizure onset detection algorithms require 21 scalp electrodes to be placed on the head. These electrodes are used to generate 18 data streams, called channels. The large number of electrodes is inconvenient for the patient and processing 18 channels can consume a considerable amount of energy, a problem for a battery-powered device. In this paper, we describe an automated way to construct detectors that use fewer channels, and thus fewer electrodes. Starting from an existing technique for constructing 18 channel patient-specific detectors, we use machine learning to automatically construct reduced channel detectors. We evaluate our algorithm on data from 16 patients used in an earlier study. On average, our algorithm reduced the number of channels from 18 to 4.6 while decreasing the mean fraction of seizure onsets detected from 99% to 97%. For 12 out of the 16 patients, there was no degradation in the detection rate. While the average detection latency increased from 7.8 s to 11.2 s, the average rate of false alarms per hour decreased from 0.35 to 0.19. We also describe a prototype implementation of a single channel EEG monitoring device built using off-the-shelf components, and use this implementation to derive an energy consumption model. Using fewer channels reduced the average energy consumption by 69%, which amounts to a 3.3x increase in battery lifetime. Finally, we show how additional energy savings can be realized by using a low-power screening detector to rule out segments of data that are obviously not seizures. Though this technique does not reduce the number of electrodes needed, it does reduce the energy consumption by an additional 16%

    Design and implementation of NoC routers and their application to Prdt-based NoC\u27s

    Full text link
    With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms

    Development of a 6-bit 15.625 MHz CMOS two-step flash analog-to-digital converter for a low dead time sub-nanosecond time measurement system

    Get PDF
    The development of a 6-bit 15.625 MHz CMOS two-step analog-to-digital converter (ADC) is presented. The ADC was developed for use in a low dead time, high-performance, sub-nanosecond time-to-digital converter (TDC). The TDC is part of a new custom CMOS application specific integrated circuit (ASIC) that will be incorporated in the next generation of front-end electronics for high-performance positron emission tomography imaging. The ADC is based upon a two-step flash architecture that reduces the comparator count by a factor-of-two when compared to a traditional flash ADC architecture and thus a significant reduction in area, power dissipation, and input capacitance of the converter is achieved. The converter contains time-interleaved auto-zeroed CMOS comparators. These comparators utilize offset correction in both the preamplifier and the subsequent regenerative latch stage to guarantee good integral and differential non-linearity performance of the converter over extreme process conditions. Also, digital error correction was employed to overcome most of the major metastability problems inherent in flash converters and to guarantee a completely monotonic transfer function. Corrected comparator offset measurements reveal that the CMOS comparator design maintains a worse case input-referred offset of less than 1 mV at conversion rates up to 8 MHz and less than a 2 mV offset at conversion rates as high as 16 MHz while dissipating less than 2.6 mW. Extensive laboratory measurements indicate that the ADC achieves differential and integral non-linearity performance of less than ±1/2 LSB with a 20 mV/LSB resolution. The ADC dissipates 90 mW from a single 5 V supply and occupies a die area of 1.97 mm x 1.13 mm in 0.8 μm CMOS technology

    HW/SW Co-design and Prototyping Approach for Embedded Smart Camera: ADAS Case Study

    Get PDF
    In 1968, Volkswagen integrated an electronic circuit as a new control fuel injection system, called the “Little Black Box”, it is considered as the first embedded system in the automotive industry. Currently, automobile constructors integrate several embedded systems into any of their new model vehicles. Behind these automobile’s electronics systems, a sophisticated Hardware/Software (HW/SW) architecture, which is based on heterogeneous components, and multiple CPUs is built. At present, they are more oriented toward visionbased systems using tiny embedded smart camera. This visionbased system in real time aspects represents one of the most challenging issues, especially in the domain of automobile’s applications. On the design side, one of the optimal solutions adopted by embedded systems designer for system performance, is to associate CPUs and hardware accelerators in the same design, in order to reduce the computational burden on the CPU and to speed-up the data processing. In this paper, we present a hardware platform-based design approach for fast embedded smart Advanced Driver Assistant System (ADAS) design and prototyping, as an alternative for the pure time-consuming simulation technique. Based on a Multi-CPU/FPGA platform, we introduced a new methodology/flow to design the different HW and SW parts of the ADAS system. Then, we shared our experience in designing and prototyping a HW/SW vision based on smart embedded system as an ADAS that helps to increase the safety of car’s drivers. We presented a real HW/SW prototype of the vision ADAS based on a Zynq FPGA. The system detects the fatigue/drowsiness state of the driver by monitoring the eyes closure and generates a real time alert. A new HW Skin Segmentation step to locate the eyes/face is proposed. Our new approach migrates the skin segmentation step from processing system (SW) to programmable logic (HW) taking the advantage of High-Level Synthesis (HLS) tool flow to accelerate the implementation, and the prototyping of the Vision based ADAS on a hardware platform

    Programming Languages For Hard Real-Time Embedded Systems

    Get PDF
    International audienceHard real-time embedded systems have traditionally been implemented using low level programming languages (such as ADA or C) at a level very close to the underlying operating system. However, for several years now the industry has started using higher level modelling languages, at least for early simulation and verification steps. The objective of this paper is to study existing formal languages including high level real-time primitives. Our review is built on the case study of an aerospace automated transfer vehicle, the particularity of which is to be composed of several multi-periodic communicating processes. In this paper, we emphasize the strengths and weaknesses of existing programming approaches when implementing this kind of system. As a result, the choice of the base rate of the program appears to have a major influence, not only on the difficulty to program the system correctly but also on the execution platform required to execute the program (operating system, scheduler, ...)

    Towards 5G Software-Defined Ecosystems: Technical Challenges, Business Sustainability and Policy Issues

    Get PDF
    Techno-economic drivers are creating the conditions for a radical change of paradigm in the design and operation of future telecommunications infrastructures. In fact, SDN, NFV, Cloud and Edge-Fog Computing are converging together into a single systemic transformation termed “Softwarization” that will find concrete exploitations in 5G systems. The IEEE SDN Initiative1 has elaborated a vision, an evolutionary path and some techno-economic scenarios of this transformation: specifically, the major technical challenges, business sustainability and policy issues have been investigated. This white paper presents: 1) an overview on the main techno-economic drivers steering the “Softwarization” of telecommunications; 2) an introduction to the Open Mobile Edge Cloud vision (covered in a companion white paper); 3) the main technical challenges in terms of operations, security and policy; 4) an analysis of the potential role of open source software; 5) some use case proposals for proof-of-concepts; and 6) a short description of the main socio-economic impacts being produced by “Softwarization”. Along these directions, IEEE SDN is also developing of an open catalogue of software platforms, toolkits, and functionalities aiming at a step-by-step development and aggregation of test-beds/field-trials on SDNNFV- 5G

    Addressing Prolonged Restore Challenges in Further Scaling DRAMs

    Get PDF
    As the de facto memory technology, DRAM has enjoyed continuous scaling over the past decades to keep performance growth and capacity enhancement. However, DRAM further scaling into deep sub-micron regime faces significant challenges. Among the induced issues, prolonged restore time is expected to be one of the major concerns, but it has been paid little attention. Aiming at restore issue, this thesis performs pioneering studies to characterize the problems, and presents techniques from different perspectives to overcome them. First, our experimental studies quantify the significant restore process variations, causing serious degradations on yield and/or performance. To solve the problem, we propose schemes to expose the variations to the architectural levels. Fast restore chunks can thus be constructed utilizing DRAM organization, and they can be exposed to the memory controller to effectively compensate the performance loss. Further, we maximize the improvement by applying restore-time-aware rank construction and hotness-aware page allocation schemes to fully utilize the fast regions. Second, in addition to simply expose the variations to higher levels, we investigate DRAM cell structures and behaviors finding that refresh and restore are two strongly correlated operations. Whereas are being fully restored after each read or write access, DRAM cells are always being fully charged by periodical refresh operations, providing an opportunity to early terminate restore. With the insight, we first propose to truncate a restore using the time distance to next refresh. Further, to provide more truncation opportunities, we integrate the multirate-refresh concepts to shorten the distance by increasing the refresh rate of recently accessed regions. Lastly, we explore higher to the application level with the inspiration that a large set of applications can well tolerate output accuracy loss and runtime errors, enabling us to exploit approximate computing to mitigate prolonged restore. By utilizing the variance in restore timing exhibited at different row segments, we reduce the restore time such that only partial segments are fully reliable. We then map the critical data onto the reliable segments to keep the application-level errors low. Atop of the approximation-aware technique, we further generalize it to support precise computing as well
    • …
    corecore