Search CORE

320 research outputs found

Macromodular Computer Design, Part 1, Volume 4, The Synchronizer Glitch Problem

Author: Computer Systems Laboratory Washington University
Publication venue: Digital Commons@Becker
Publication date: 01/02/1974
Field of study

Hardware/Software Approach for Code Synchronization in Low-Power Multi-Core Sensor Nodes

Author: Ansaloni Giovanni
Atienza Alonso David
Beretta Ivan
Braojos Lopez Ruben
Publication venue: New York, IEEE/ACM Press
Publication date: 09/12/2013
Field of study

Latest embedded bio-signal analysis applications, targeting low-power Wireless Body Sensor Nodes (WBSNs), present conflicting requirements. On one hand, bio-signal analysis applications are continuously increasing their demand for high computing capabilities. On the other hand, long-term signal processing in WBSNs must be provided within their highly constrained energy budget. In this context, parallel processing effectively increases the power efficiency of WBSNs, but only if the execution can be properly synchronized among computing elements. To address this challenge, in this work we propose a hardware/software approach to synchronize the execution of bio-signal processing applications in multi-core WBSNs. This new approach requires little hardware resources and very few adaptations in the source code. Moreover, it provides the necessary flexibility to execute applications with an arbitrarily large degree of complexity and parallelism, enabling considerable reductions in power consumption for all multi-core WBSN execution conditions. Experimental results show that a multi-core WBSN architecture using the illustrated approach can obtain energy savings of up to 40%, with respect to an equivalent singlecore architecture, when performing advanced bio-signal analysi

Infoscience - École polytechnique fédérale de Lausanne

The 30/20 GHz flight experiment system, phase 2. Volume 2: Experiment system description

Author: Bergman S. G.
Bronstein L.
Forman B. J.
Kawamoto Y.
Reisenfeld S.
Ribarich J. J.
Scope J. R.
Publication venue
Publication date
Field of study

A detailed technical description of the 30/20 GHz flight experiment system is presented. The overall communication system is described with performance analyses, communication operations, and experiment plans. Hardware descriptions of the payload are given with the tradeoff studies that led to the final design. The spacecraft bus which carries the payload is discussed and its interface with the launch vehicle system is described. Finally, the hardwares and the operations of the terrestrial segment are presented

NASA Technical Reports Server

Hardware/Software Co-Design of Ultra-Low Power Biomedical Monitors

Author: Braojos Lopez Ruben
Publication venue: Lausanne, EPFL
Publication date: 29/11/2016
Field of study

Ongoing changes in world demographics and the prevalence of unhealthy lifestyles are imposing a paradigm shift in healthcare delivery. Nowadays, chronic ailments such as cardiovascular diseases, hypertension and diabetes, represent the most common causes of death according to the World Health Organization. It is estimated that 63% of deaths worldwide are directly or indirectly related to these non-communicable diseases (NCDs), and by 2030 it is predicted that the health delivery cost will reach an amount comparable to 75% of the current GDP. In this context, technologies based on Wireless Sensor Nodes (WSNs) effectively alleviate this burden enabling the conception of wearable biomedical monitors composed of one or several devices connected through a Wireless Body Sensor Network (WBSN). Energy efficiency is of paramount importance for these devices, which must operate for prolonged periods of time with a single battery charge. In this thesis I propose a set of hardware/software co-design techniques to drastically increase the energy efficiency of bio-medical monitors. To this end, I jointly explore different alternatives to reduce the required computational effort at the software level while optimizing the power consumption of the processing hardware by employing ultra-low power multi-core architectures that exploit DSP application characteristics. First, at the sensor level, I study the utilization of a heartbeat classifier to perform selective advanced DSP on state-of-the-art ECG bio-medical monitors. To this end, I developed a framework to design and train real-time, lightweight heartbeat neuro-fuzzy classifiers, detail- ing the required optimizations to efficiently execute them on a resource-constrained platform. Then, at the network level I propose a more complex transmission-aware WBSN for activity monitoring that provides different tradeoffs between classification accuracy and transmission volume. In this work, I study the combination of a minimal set of WSNs with a smartphone, and propose two classification schemes that trade accuracy for transmission volume. The proposed method can achieve accuracies ranging from 88% to 97% and can save up to 86% of wireless transmissions, outperforming the state-of-the-art alternatives. Second, I propose a synchronization-based low-power multi-core architecture for bio-signal processing. I introduce a hardware/software synchronization mechanism that allows to achieve high energy efficiency while parallelizing the execution of multi-channel DSP applications. Then, I generalize the methodology to support bio-signal processing applications with an arbitrarily high degree of parallelism. Due to the benefits of SIMD execution and software pipelining, the architecture can reduce its power consumption by up 38% when compared to an equivalent low-power single-core alternative. Finally, I focused on the optimization of the multi-core memory subsystem, which is the major contributor to the overall system power consumption. First I considered a hybrid memory subsystem featuring a small reliable partition that can operate at ultra-low voltage enabling low-power buffering of data and obtaining up to 50% energy savings. Second, I explore a two-level memory hierarchy based on non-volatile memories (NVM) that allows for aggressive fine-grained power gating enabled by emerging low-power NVM technologies and monolithic 3D integration. Experimental results show that, by adopting this memory hierarchy, power consumption can be reduced by 5.42x in the DSP stage

Infoscience - École polytechnique fédérale de Lausanne

Many-core and heterogeneous architectures: programming models and compilation toolchains

Author
Publication venue: Politecnico di Torino
Publication date: 30/10/2020
Field of study

1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInopartially_openembargoed_20211002Barchi, Francesc

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

The Open AUC Project

Progress in analytical ultracentrifugation (AUC) has been hindered by obstructions to hardware innovation and by software incompatibility. In this paper, we announce and outline the Open AUC Project. The goals of the Open AUC Project are to stimulate AUC innovation by improving instrumentation, detectors, acquisition and analysis software, and collaborative tools. These improvements are needed for the next generation of AUC-based research. The Open AUC Project combines on-going work from several different groups. A new base instrument is described, one that is designed from the ground up to be an analytical ultracentrifuge. This machine offers an open architecture, hardware standards, and application programming interfaces for detector developers. All software will use the GNU Public License to assure that intellectual property is available in open source format. The Open AUC strategy facilitates collaborations, encourages sharing, and eliminates the chronic impediments that have plagued AUC innovation for the last 20 years. This ultracentrifuge will be equipped with multiple and interchangeable optical tracks so that state-of-the-art electronics and improved detectors will be available for a variety of optical systems. The instrument will be complemented by a new rotor, enhanced data acquisition and analysis software, as well as collaboration software. Described here are the instrument, the modular software components, and a standardized database that will encourage and ease integration of data analysis and interpretation software

KOPS - The Institutional Repository of the University of Konstanz

Crossref

Springer - Publisher Connector

PubMed Central

UNH Scholars' Repository

MPG.PuRe

Design of variation-tolerant synchronizers for multiple clock and voltage domains

Author: Alshaikh Mohammed Saleh Abdullah
Publication venue: Newcastle University
Publication date: 01/01/2014
Field of study

PhD ThesisParametric variability increasingly affects the performance of electronic circuits as the fabrication technology has reached the level of 32nm and beyond. These parameters may include transistor Process parameters (such as threshold voltage), supply Voltage and Temperature (PVT), all of which could have a significant impact on the speed and power consumption of the circuit, particularly if the variations exceed the design margins. As systems are designed with more asynchronous protocols, there is a need for highly robust synchronizers and arbiters. These components are often used as interfaces between communication links of different timing domains as well as sampling devices for asynchronous inputs coming from external components. These applications have created a need for new robust designs of synchronizers and arbiters that can tolerate process, voltage and temperature variations. The aim of this study was to investigate how synchronizers and arbiters should be designed to tolerate parametric variations. All investigations focused mainly on circuit-level and transistor level designs and were modeled and simulated in the UMC90nm CMOS technology process. Analog simulations were used to measure timing parameters and power consumption along with a “Monte Carlo” statistical analysis to account for process variations. Two main components of synchronizers and arbiters were primarily investigated: flip-flop and mutual-exclusion element (MUTEX). Both components can violate the input timing conditions, setup and hold window times, which could cause metastability inside their bistable elements and possibly end in failures. The mean-time between failures is an important reliability feature of any synchronizer delay through the synchronizer. The MUTEX study focused on the classical circuit, in addition to a number of tolerance, based on increasing internal gain by adding current sources, reducing the capacitive loading, boosting the transconductance of the latch, compensating the existing Miller capacitance, and adding asymmetry to maneuver the metastable point. The results showed that some circuits had little or almost no improvements, while five techniques showed significant improvements by reducing τ and maintaining high tolerance. Three design approaches are proposed to provide variation-tolerant synchronizers. wagging synchronizer proposed to First, the is significantly increase reliability over that of the conventional two flip-flop synchronizer. The robustness of the wagging technique can be enhanced by using robust τ latches or adding one more cycle of synchronization. The second approach is the Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly detecting a metastable event and correcting it by enforcing the previously stored logic value. This technique significantly reduces the resolution time down from uncertain synchronization technique is proposed to transfer signals between Multiple- Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional level-shifters between the domains or multiple power supplies within each domain. This interface circuit uses a synchronous set and feedback reset protocol which provides level-shifting and synchronization of all signals between the domains, from a wide range of voltage-supplies and clock frequencies. Overall, synchronizer circuits can tolerate variations to a greater extent by employing the wagging technique or using a MADAC latch, while MUTEX tolerance can suffice with small circuit modifications. Communication between MVD/MCD can be achieved by an asynchronous handshake without a need for adding level-shifters.The Saudi Arabian Embassy in London, Umm Al-Qura University, Saudi Arabi

Newcastle University eTheses

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

Author: Davide Rossi
Florian Glaser
Germain Haugoug
Giuseppe Tagliavini
Luca Benini
Qiuting Huang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/04/2020
Field of study

The steeply growing performance demands for highly power- and energy-constrained processing systems such as end-nodes of the Internet-of-Things (IoT) have led to parallel near-threshold computing (NTC), joining the energy-efficiency benefits of low-voltage operation with the performance typical of parallel systems. Shared-L1-memory multiprocessor clusters are a promising architecture, delivering performance in the order of GOPS and over 100 GOPS/W of energy-efficiency. However, this level of computational efficiency can only be reached by maximizing the effective utilization of the processing elements (PEs) available in the clusters. Along with this effort, the optimization of PE-to-PE synchronization and communication is a critical factor for performance. In this article, we describe a light-weight hardware-accelerated synchronization and communication unit (SCU) for tightly-coupled clusters of processors. We detail the architecture, which enables fine-grain per-PE power management, and its integration into an eight-core cluster of RISC-V processors. To validate the effectiveness of the proposed solution, we implemented the eight-core cluster in advanced 22 nm FDX technology and evaluated performance and energy-efficiency with tunable microbenchmarks and a set of rea-life applications and kernels. The proposed solution allows synchronization-free regions as small as 42 cycles, over 41 smaller than the baseline implementation based on fast test-and-set access to L1 memory when constraining the microbenchmarks to 10 percent synchronization overhead. When evaluated on the real-life DSP-applications, the proposed SCU improves performance by up to 92 and 23 percent on average and energy efficiency by up to 98 and 39 percent on average

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna