1,957 research outputs found
A methodology to implement real-time applications on reconfigurable circuits
Special Issue Engineering of Configurable SystemsInternational audienceThis paper presents an extension of our AAA rapid prototyping methodology for the optimized implementation ofreal-time applications onto reconfigurable circuits. This extension is based on an unified model of factorized datadependence graphs as well to specify the application algorihtm, as to deduce the possible implementations ontoreconfigurable hardware, in terms of graphs transformations. This transformation flow has been implemented inSynDEx, a system level CAD software tool
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
Sensor selection for energy-efficient ambulatory medical monitoring
Epilepsy affects over three million Americans of all ages. Despite recent advances, more than 20% of individuals with epilepsy never achieve adequate control of their seizures. The use of a small, portable, non-invasive seizure monitor could benefit these individuals tremendously. However, in order for such a device to be suitable for long-term wear, it must be both comfortable and lightweight.
Typical state-of-the-art non-invasive seizure onset detection algorithms require 21 scalp electrodes to be placed on the head. These electrodes are used to generate 18 data streams, called channels. The large number of electrodes is inconvenient for the patient and processing 18 channels can consume a considerable amount of energy, a problem for a battery-powered device.
In this paper, we describe an automated way to construct detectors that use fewer channels, and thus fewer electrodes. Starting from an existing technique for constructing 18 channel patient-specific detectors, we use machine learning to automatically construct reduced channel detectors. We evaluate our algorithm on data from 16 patients used in an earlier study. On average, our algorithm reduced the number of channels from 18 to 4.6 while decreasing the mean fraction of seizure onsets detected from 99% to 97%. For 12 out of the 16 patients, there was no degradation in the detection rate. While the average detection latency increased from 7.8 s to 11.2 s, the average rate of false alarms per hour decreased from 0.35 to 0.19.
We also describe a prototype implementation of a single channel EEG monitoring device built using off-the-shelf components, and use this implementation to derive an energy consumption model. Using fewer channels reduced the average energy consumption by 69%, which amounts to a 3.3x increase in battery lifetime.
Finally, we show how additional energy savings can be realized by using a low-power screening detector to rule out segments of data that are obviously not seizures. Though this technique does not reduce the number of electrodes needed, it does reduce the energy consumption by an additional 16%
Design and implementation of NoC routers and their application to Prdt-based NoC\u27s
With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms
Development of a 6-bit 15.625 MHz CMOS two-step flash analog-to-digital converter for a low dead time sub-nanosecond time measurement system
The development of a 6-bit 15.625 MHz CMOS two-step analog-to-digital converter (ADC) is presented. The ADC was developed for use in a low dead time, high-performance, sub-nanosecond time-to-digital converter (TDC). The TDC is part of a new custom CMOS application specific integrated circuit (ASIC) that will be incorporated in the next generation of front-end electronics for high-performance positron emission tomography imaging. The ADC is based upon a two-step flash architecture that reduces the comparator count by a factor-of-two when compared to a traditional flash ADC architecture and thus a significant reduction in area, power dissipation, and input capacitance of the converter is achieved. The converter contains time-interleaved auto-zeroed CMOS comparators. These comparators utilize offset correction in both the preamplifier and the subsequent regenerative latch stage to guarantee good integral and differential non-linearity performance of the converter over extreme process conditions. Also, digital error correction was employed to overcome most of the major metastability problems inherent in flash converters and to guarantee a completely monotonic transfer function. Corrected comparator offset measurements reveal that the CMOS comparator design maintains a worse case input-referred offset of less than 1 mV at conversion rates up to 8 MHz and less than a 2 mV offset at conversion rates as high as 16 MHz while dissipating less than 2.6 mW. Extensive laboratory measurements indicate that the ADC achieves differential and integral non-linearity performance of less than ±1/2 LSB with a 20 mV/LSB resolution. The ADC dissipates 90 mW from a single 5 V supply and occupies a die area of 1.97 mm x 1.13 mm in 0.8 μm CMOS technology
HW/SW Co-design and Prototyping Approach for Embedded Smart Camera: ADAS Case Study
In 1968, Volkswagen integrated an electronic circuit as a new control fuel injection system, called the “Little Black Box”, it is considered as the first embedded system in the automotive industry. Currently, automobile constructors integrate several embedded systems into any of their new model vehicles. Behind these automobile’s electronics systems, a sophisticated Hardware/Software (HW/SW) architecture, which is based on heterogeneous components, and multiple CPUs is built. At present, they are more oriented toward visionbased systems using tiny embedded smart camera. This visionbased system in real time aspects represents one of the most challenging issues, especially in the domain of automobile’s applications. On the design side, one of the optimal solutions adopted by embedded systems designer for system performance, is to associate CPUs and hardware accelerators in the same design, in order to reduce the computational burden on the CPU and to speed-up the data processing. In this paper, we present a hardware platform-based design approach for fast embedded smart Advanced Driver Assistant System (ADAS) design and prototyping, as an alternative for the pure time-consuming simulation technique. Based on a Multi-CPU/FPGA platform, we introduced a new methodology/flow to design the different HW and SW parts of the ADAS system. Then, we shared our experience in designing and prototyping a HW/SW vision based on smart embedded system as an ADAS that helps to increase the safety of car’s drivers. We presented a real HW/SW prototype of the vision ADAS based on a Zynq FPGA. The system detects the fatigue/drowsiness state of the driver by monitoring the eyes closure and generates a real time alert. A new HW Skin Segmentation step to locate the eyes/face is proposed. Our new approach migrates the skin segmentation step from processing system (SW) to programmable logic (HW) taking the advantage of High-Level Synthesis (HLS) tool flow to accelerate the implementation, and the prototyping of the Vision based ADAS on a hardware platform
Programming Languages For Hard Real-Time Embedded Systems
International audienceHard real-time embedded systems have traditionally been implemented using low level programming languages (such as ADA or C) at a level very close to the underlying operating system. However, for several years now the industry has started using higher level modelling languages, at least for early simulation and verification steps. The objective of this paper is to study existing formal languages including high level real-time primitives. Our review is built on the case study of an aerospace automated transfer vehicle, the particularity of which is to be composed of several multi-periodic communicating processes. In this paper, we emphasize the strengths and weaknesses of existing programming approaches when implementing this kind of system. As a result, the choice of the base rate of the program appears to have a major influence, not only on the difficulty to program the system correctly but also on the execution platform required to execute the program (operating system, scheduler, ...)
Towards 5G Software-Defined Ecosystems: Technical Challenges, Business Sustainability and Policy Issues
Techno-economic drivers are creating the conditions for a radical change of paradigm in the design and operation of future telecommunications infrastructures. In fact, SDN, NFV, Cloud and Edge-Fog Computing are converging together into a single systemic transformation termed “Softwarization” that will find concrete exploitations in 5G systems. The IEEE SDN Initiative1 has elaborated a vision, an evolutionary path and some techno-economic scenarios of this transformation: specifically, the major technical challenges, business sustainability and policy issues have been investigated. This white paper presents: 1) an overview on the main techno-economic drivers steering the “Softwarization” of telecommunications; 2) an introduction to the Open Mobile Edge Cloud vision (covered in a companion white paper); 3) the main technical challenges in terms of operations, security and policy; 4) an analysis of the potential role of open source software; 5) some use case proposals for proof-of-concepts; and 6) a short description of the main socio-economic impacts being produced by “Softwarization”. Along these directions, IEEE SDN is also developing of an open catalogue of software platforms, toolkits, and functionalities aiming at a step-by-step development and aggregation of test-beds/field-trials on SDNNFV- 5G
Addressing Prolonged Restore Challenges in Further Scaling DRAMs
As the de facto memory technology, DRAM has enjoyed continuous scaling over the past decades to keep performance growth and capacity enhancement. However, DRAM further scaling into deep sub-micron regime faces significant challenges. Among the induced issues, prolonged restore time is expected to be one of the major concerns, but it has been paid little attention. Aiming at restore issue, this thesis performs pioneering studies to characterize the problems, and presents techniques from different perspectives to overcome them.
First, our experimental studies quantify the significant restore process variations, causing serious degradations on yield and/or performance. To solve the problem, we propose schemes to expose the variations to the architectural levels. Fast restore chunks can thus be constructed utilizing DRAM organization, and they can be exposed to the memory controller to effectively compensate the performance loss. Further, we maximize the improvement by applying restore-time-aware rank construction and hotness-aware page allocation schemes to fully utilize the fast regions.
Second, in addition to simply expose the variations to higher levels, we investigate DRAM cell structures and behaviors finding that refresh and restore are two strongly correlated operations. Whereas are being fully restored after each read or write access, DRAM cells are always being fully charged by periodical refresh operations, providing an opportunity to early terminate restore. With the insight, we first propose to truncate a restore using the time distance to next refresh. Further, to provide more truncation opportunities, we integrate the multirate-refresh concepts to shorten the distance by increasing the refresh rate of recently accessed regions.
Lastly, we explore higher to the application level with the inspiration that a large set of applications can well tolerate output accuracy loss and runtime errors, enabling us to exploit approximate computing to mitigate prolonged restore. By utilizing the variance in restore timing exhibited at different row segments, we reduce the restore time such that only partial segments are fully reliable. We then map the critical data onto the reliable segments to keep the application-level errors low. Atop of the approximation-aware technique, we further generalize it to support precise computing as well
Recommended from our members
Multimedia delivery in the future internet
The term “Networked Media” implies that all kinds of media including text, image, 3D graphics, audio
and video are produced, distributed, shared, managed and consumed on-line through various networks,
like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white
paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked
challenges of the Networked Media in the transition to the Future of the Internet.
Internet has evolved and changed the way we work and live. End users of the Internet have been confronted
with a bewildering range of media, services and applications and of technological innovations concerning
media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace
of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more
than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so
regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected
to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising
to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged
that in a near- to mid-term future, the Internet will provide the means to share and distribute (new)
multimedia content and services with superior quality and striking flexibility, in a trusted and personalized
way, improving citizens’ quality of life, working conditions, edutainment and safety.
In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe
network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as
community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of
interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and
innovative applications “on the move”, like virtual collaboration environments, personalised services/
media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content
combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P
networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to
contribute towards such a vision.
Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6)
and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily
contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way
ahead in the area of Content Aware media delivery platforms
- …