Search CORE

1,957 research outputs found

A methodology to implement real-time applications on reconfigurable circuits

Author: Akil Mohamed
Grandpierre Thierry
Kaouane Linda
Sorel Yves
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Special Issue Engineering of Configurable SystemsInternational audienceThis paper presents an extension of our AAA rapid prototyping methodology for the optimized implementation ofreal-time applications onto reconfigurable circuits. This extension is based on an unified model of factorized datadependence graphs as well to specify the application algorihtm, as to deduce the possible implementations ontoreconfigurable hardware, in terms of graphs transformations. This transformation flow has been implemented inSynDEx, a system level CAD software tool

HAL-UNICE

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

Author: Benini Luca
Conti Francesco
Schiavone Pasquale Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Sensor selection for energy-efficient ambulatory medical monitoring

Author: Guttag John V.
Shih Eugene Inghaw
Shoeb Ali H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Epilepsy affects over three million Americans of all ages. Despite recent advances, more than 20% of individuals with epilepsy never achieve adequate control of their seizures. The use of a small, portable, non-invasive seizure monitor could benefit these individuals tremendously. However, in order for such a device to be suitable for long-term wear, it must be both comfortable and lightweight. Typical state-of-the-art non-invasive seizure onset detection algorithms require 21 scalp electrodes to be placed on the head. These electrodes are used to generate 18 data streams, called channels. The large number of electrodes is inconvenient for the patient and processing 18 channels can consume a considerable amount of energy, a problem for a battery-powered device. In this paper, we describe an automated way to construct detectors that use fewer channels, and thus fewer electrodes. Starting from an existing technique for constructing 18 channel patient-specific detectors, we use machine learning to automatically construct reduced channel detectors. We evaluate our algorithm on data from 16 patients used in an earlier study. On average, our algorithm reduced the number of channels from 18 to 4.6 while decreasing the mean fraction of seizure onsets detected from 99% to 97%. For 12 out of the 16 patients, there was no degradation in the detection rate. While the average detection latency increased from 7.8 s to 11.2 s, the average rate of false alarms per hour decreased from 0.35 to 0.19. We also describe a prototype implementation of a single channel EEG monitoring device built using off-the-shelf components, and use this implementation to derive an energy consumption model. Using fewer channels reduced the average energy consumption by 69%, which amounts to a 3.3x increase in battery lifetime. Finally, we show how additional energy savings can be realized by using a low-power screening detector to rule out segments of data that are obviously not seizures. Though this technique does not reduce the number of electrodes needed, it does reduce the energy consumption by an additional 16%

CiteSeerX

DSpace@MIT

Crossref

Design and implementation of NoC routers and their application to Prdt-based NoC\u27s

Author: Neelakrishnan Shankar Narayanan
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2007
Field of study

With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms

University of Nevada, Las Vegas Repository

Development of a 6-bit 15.625 MHz CMOS two-step flash analog-to-digital converter for a low dead time sub-nanosecond time measurement system

Author: Swann Brian Keith
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2000
Field of study

The development of a 6-bit 15.625 MHz CMOS two-step analog-to-digital converter (ADC) is presented. The ADC was developed for use in a low dead time, high-performance, sub-nanosecond time-to-digital converter (TDC). The TDC is part of a new custom CMOS application specific integrated circuit (ASIC) that will be incorporated in the next generation of front-end electronics for high-performance positron emission tomography imaging. The ADC is based upon a two-step flash architecture that reduces the comparator count by a factor-of-two when compared to a traditional flash ADC architecture and thus a significant reduction in area, power dissipation, and input capacitance of the converter is achieved. The converter contains time-interleaved auto-zeroed CMOS comparators. These comparators utilize offset correction in both the preamplifier and the subsequent regenerative latch stage to guarantee good integral and differential non-linearity performance of the converter over extreme process conditions. Also, digital error correction was employed to overcome most of the major metastability problems inherent in flash converters and to guarantee a completely monotonic transfer function. Corrected comparator offset measurements reveal that the CMOS comparator design maintains a worse case input-referred offset of less than 1 mV at conversion rates up to 8 MHz and less than a 2 mV offset at conversion rates as high as 16 MHz while dissipating less than 2.6 mW. Extensive laboratory measurements indicate that the ADC achieves differential and integral non-linearity performance of less than ±1/2 LSB with a 20 mV/LSB resolution. The ADC dissipates 90 mW from a single 5 V supply and occupies a die area of 1.97 mm x 1.13 mm in 0.8 μm CMOS technology

University of Tennessee, Knoxville: Trace

HW/SW Co-design and Prototyping Approach for Embedded Smart Camera: ADAS Case Study

Author: Cabanes Q.
Han D.S.
Ramdan A.C.
Rouis H.
Senouci B.
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 15/12/2019
Field of study

In 1968, Volkswagen integrated an electronic circuit as a new control fuel injection system, called the “Little Black Box”, it is considered as the first embedded system in the automotive industry. Currently, automobile constructors integrate several embedded systems into any of their new model vehicles. Behind these automobile’s electronics systems, a sophisticated Hardware/Software (HW/SW) architecture, which is based on heterogeneous components, and multiple CPUs is built. At present, they are more oriented toward visionbased systems using tiny embedded smart camera. This visionbased system in real time aspects represents one of the most challenging issues, especially in the domain of automobile’s applications. On the design side, one of the optimal solutions adopted by embedded systems designer for system performance, is to associate CPUs and hardware accelerators in the same design, in order to reduce the computational burden on the CPU and to speed-up the data processing. In this paper, we present a hardware platform-based design approach for fast embedded smart Advanced Driver Assistant System (ADAS) design and prototyping, as an alternative for the pure time-consuming simulation technique. Based on a Multi-CPU/FPGA platform, we introduced a new methodology/flow to design the different HW and SW parts of the ADAS system. Then, we shared our experience in designing and prototyping a HW/SW vision based on smart embedded system as an ADAS that helps to increase the safety of car’s drivers. We presented a real HW/SW prototype of the vision ADAS based on a Zynq FPGA. The system detects the fatigue/drowsiness state of the driver by monitoring the eyes closure and generates a real time alert. A new HW Skin Segmentation step to locate the eyes/face is proposed. Our new approach migrates the skin segmentation step from processing system (SW) to programmable logic (HW) taking the advantage of High-Level Synthesis (HLS) tool flow to accelerate the implementation, and the prototyping of the Vision based ADAS on a hardware platform

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Programming Languages For Hard Real-Time Embedded Systems

Author: BONIOL F.
Forget J.
Lesens D
Pagetti C
Pouzet M.
Publication venue: HAL CCSD
Publication date: 29/01/2008
Field of study

International audienceHard real-time embedded systems have traditionally been implemented using low level programming languages (such as ADA or C) at a level very close to the underlying operating system. However, for several years now the industry has started using higher level modelling languages, at least for early simulation and verification steps. The objective of this paper is to study existing formal languages including high level real-time primitives. Our review is built on the case study of an aerospace automated transfer vehicle, the particularity of which is to be composed of several multi-periodic communicating processes. In this paper, we emphasize the strengths and weaknesses of existing programming approaches when implementing this kind of system. As a result, the choice of the base rate of the program appears to have a major influence, not only on the difficulty to program the system correctly but also on the execution platform required to execute the program (operating system, scheduler, ...)

HAL Descartes

Hal-Diderot

Towards 5G Software-Defined Ecosystems: Technical Challenges, Business Sustainability and Policy Issues

Author: Bursell M
Buyukkoc C
Callegati F
Chemouil P
Crespi N
Galis A
Healy E
Huang J
I CL
Manzalini A
Odini MP
Sharrock S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2016
Field of study

Techno-economic drivers are creating the conditions for a radical change of paradigm in the design and operation of future telecommunications infrastructures. In fact, SDN, NFV, Cloud and Edge-Fog Computing are converging together into a single systemic transformation termed “Softwarization” that will find concrete exploitations in 5G systems. The IEEE SDN Initiative1 has elaborated a vision, an evolutionary path and some techno-economic scenarios of this transformation: specifically, the major technical challenges, business sustainability and policy issues have been investigated. This white paper presents: 1) an overview on the main techno-economic drivers steering the “Softwarization” of telecommunications; 2) an introduction to the Open Mobile Edge Cloud vision (covered in a companion white paper); 3) the main technical challenges in terms of operations, security and policy; 4) an analysis of the potential role of open source software; 5) some use case proposals for proof-of-concepts; and 6) a short description of the main socio-economic impacts being produced by “Softwarization”. Along these directions, IEEE SDN is also developing of an open catalogue of software platforms, toolkits, and functionalities aiming at a step-by-step development and aggregation of test-beds/field-trials on SDNNFV- 5G

UCL Discovery

Addressing Prolonged Restore Challenges in Further Scaling DRAMs

Author: Zhang Xianwei
Publication venue
Publication date: 28/09/2017
Field of study

As the de facto memory technology, DRAM has enjoyed continuous scaling over the past decades to keep performance growth and capacity enhancement. However, DRAM further scaling into deep sub-micron regime faces significant challenges. Among the induced issues, prolonged restore time is expected to be one of the major concerns, but it has been paid little attention. Aiming at restore issue, this thesis performs pioneering studies to characterize the problems, and presents techniques from different perspectives to overcome them. First, our experimental studies quantify the significant restore process variations, causing serious degradations on yield and/or performance. To solve the problem, we propose schemes to expose the variations to the architectural levels. Fast restore chunks can thus be constructed utilizing DRAM organization, and they can be exposed to the memory controller to effectively compensate the performance loss. Further, we maximize the improvement by applying restore-time-aware rank construction and hotness-aware page allocation schemes to fully utilize the fast regions. Second, in addition to simply expose the variations to higher levels, we investigate DRAM cell structures and behaviors finding that refresh and restore are two strongly correlated operations. Whereas are being fully restored after each read or write access, DRAM cells are always being fully charged by periodical refresh operations, providing an opportunity to early terminate restore. With the insight, we first propose to truncate a restore using the time distance to next refresh. Further, to provide more truncation opportunities, we integrate the multirate-refresh concepts to shorten the distance by increasing the refresh rate of recently accessed regions. Lastly, we explore higher to the application level with the inspiration that a large set of applications can well tolerate output accuracy loss and runtime errors, enabling us to exploit approximate computing to mitigate prolonged restore. By utilizing the variance in restore timing exhibited at different row segments, we reduce the restore time such that only partial segments are fully reliable. We then map the critical data onto the reliable segments to keep the application-level errors low. Atop of the approximation-aware technique, we further generalize it to support precise computing as well

D-Scholarship@Pitt

Recommended from our members

Multimedia delivery in the future internet

Author: Aggoun A
Amon P
Arbel I
Chernilov A
Cosmas J
Garcia G
Jari A
Keller S
Kontopoulos C
Lamy-Bergot C
Leon A
Mattavelli M
Mauthe A
Mota T
Naumann M
Navarro A
Negru O
Pinto F
Shao B
Timmerer C
Tsekleves E
Zahariadis T
Publication venue: 'Society for Leukocyte Biology'
Publication date: 01/01/2008
Field of study

The term “Networked Media” implies that all kinds of media including text, image, 3D graphics, audio and video are produced, distributed, shared, managed and consumed on-line through various networks, like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked challenges of the Networked Media in the transition to the Future of the Internet. Internet has evolved and changed the way we work and live. End users of the Internet have been confronted with a bewildering range of media, services and applications and of technological innovations concerning media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged that in a near- to mid-term future, the Internet will provide the means to share and distribute (new) multimedia content and services with superior quality and striking flexibility, in a trusted and personalized way, improving citizens’ quality of life, working conditions, edutainment and safety. In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and innovative applications “on the move”, like virtual collaboration environments, personalised services/ media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to contribute towards such a vision. Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6) and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way ahead in the area of Content Aware media delivery platforms

Brunel University Research Archive