Search CORE

358 research outputs found

Synchronizer-Free Digital Link Controller

Author: Bund Johannes
Függer Matthias
Lenzen Christoph
Medina Moti
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This work presents a producer-consumer link between two independent clock domains. The link allows for metastability-free, low-latency, high-throughput communication by slight adjustments to the clock frequencies of the producer and consumer domains steered by a controller circuit. Any such controller cannot deterministically avoid, detect, nor resolve metastability. Typically, this is addressed by synchronizers, incurring a larger dead time in the control loop. We follow the approach of Friedrichs et al. (TC 2018) who proposed metastability-containing circuits. The result is a simple control circuit that may become metastable, yet deterministically avoids buffer underrun or overflow. More specifically, the controller output may become metastable, but this may only affect oscillator speeds within specific bounds. In contrast, communication is guaranteed to remain metastability-free. We formally prove correctness of the producer-consumer link and a possible implementation that has only small overhead. With SPICE simulations of the proposed implementation we further substantiate our claims. The simulation uses 65nm process running at roughly 2GHz.Comment: 12 page journal articl

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

MPG.PuRe

Hal-Diderot

Hazard-free clock synchronization

Author: Bund Johannes
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with a small clock skew. Hence, chips are split into multiple clock regions, each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on the implementation of clock synchronization algorithms. A major obstacle when implementing circuits on clock domain crossings are hazardous signals. We can formally define hazards by extending the Boolean logic by a third value u. In this thesis, we describe a theory for designing and analyzing hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them. In the end, we present two implementations of the GCS algorithm by Lenzen, Locher, and Wattenhofer (JACM 2010). We prove by rigorous analysis that the systems implement the algorithm. The theory described above is used to prove that our clock synchronization circuits are hazard-free (in the sense that they compute the most precise output possible). Simulation of our GCS system shows that it achieves a skew between neighboring clock regions that is smaller than a few inverter delays.Aufgrund der zunehmenden Komplexität von Mikroprozessoren ist es unmöglich, mit einer einzigen Taktquelle den gesamten Prozessor ohne großen Versatz zu takten. Daher werden Chips in mehrere Regionen aufgeteilt, die jeweils von einer einzelnen Taktquelle abgedeckt werden. Dies stellt ein Problem für die Kommunikation zwischen diesen Taktregionen dar. Algorithmen zur Taktsynchronisation bieten einen Vorteil gegenüber aktuellen Lösungen, wie z.B. GALS-Systemen. Synchronisiert man die Taktregionen, so verbessert sich die Latenz der Kommunikation erheblich. In Schaltkreisen zwischen zwei Taktregionen können undefinierte Signale, sogenannte Hazards auftreten. Indem wir die boolesche Algebra um einen dritten Wert u erweitern, können wir diese Hazards formal definieren. In dieser Arbeit zeigen wir eine Methode zum Entwurf und zur Analyse von hazard-freien Schaltungen. Wir entwickeln Strategien für Kodierungen die Hazards vermeiden und zur Konstruktion von hazard-freien Schaltungen. Darüber hinaus stellen wir Algorithmen Taktsynchronisation vor und wie diese kombiniert werden können. Zum Schluss stellen wir zwei Implementierungen des GCS-Algorithmus von Lenzen, Locher und Wattenhofer (JACM 2010) vor. Oben genannte Mechanismen werden verwendet, um formal zu beweisen, dass diese Implementierungen korrekt sind. Die Implementierung hat keine Hazards, das heißt sie berechnet die bestmo ̈gliche Ausgabe. Anschließende Simulation der GCS Implementierung erzielt einen Versatz zwischen benachbarten Taktregionen, der kleiner als ein paar Gatter-Laufzeiten ist

Universaar

Acronym

Recommended from our members

Network-on-Chip Synchronization

Author: Buckler Mark
Publication venue: ScholarWorks@UMass Amherst
Publication date: 07/11/2014
Field of study

Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency. First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed. To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers

ScholarWorks@UMass Amherst

PALS: Distributed Gradient Clocking on Chip

Author: Bund Johannes
Függer Matthias
Medina Moti
Publication venue
Publication date: 29/08/2023
Field of study

Consider an arbitrary network of communicating modules on a chip, each requiring a local signal telling it when to execute a computational step. There are three common solutions to generating such a local clock signal: (i) by deriving it from a single, central clock source, (ii) by local, free-running oscillators, or (iii) by handshaking between neighboring modules. Conceptually, each of these solutions is the result of a perceived dichotomy in which (sub)systems are either clocked or asynchronous. We present a solution and its implementation that lies between these extremes. Based on a distributed gradient clock synchronization algorithm, we show a novel design providing modules with local clocks, the frequency bounds of which are almost as good as those of free-running oscillators, yet neighboring modules are guaranteed to have a phase offset substantially smaller than one clock cycle. Concretely, parameters obtained from a 15nm ASIC simulation running at 2GHz yield mathematical worst-case bounds of 20ps on the phase offset for a

32 \times 32

node grid network

arXiv.org e-Print Archive

Petri nets based components within globally asynchronous locally synchronous systems

Author: Ferreira Henrique Afonso
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2010
Field of study

Dissertação apresentada na Faculdade de Ciências e Tecnologias da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia Electrotécnica e ComputadoresThe main goal is to develop a solution for the interconnection of components constituent of a GALS - Globally Asynchronous, Locally Synchronous – system. The components are implemented in parallel obtained as a result of the partition of a model expressed a Petri net (PN), performed using the PNs editor SNOOPY-IOPT in conjunction with the Split tool and the tools to automatically generate the VHDL code from the representations of the PNML models resulting from the partition (these tools were developed under the project FORDESIGN and are available at http://www.uninova.pt/FORDESIGN). Typical solutions will be analyzed to ensure proper communication between components of the GALS system, as well as characterized and developed an appropriate solution for the interconnection of the components associated with the PN sub-models. The final goal (not attained with this thesis) would be to acquire a tool that allows generation of code for the interconnection solution from the associated components, considering a specific application. The solution proposed for componentes interconnection was coded in VHDL and the implementation platforms used for testing include the Xilinx FPGA Spartan-3 and Virtex-II

Repositório da Universidade Nova de Lisboa

On time, time synchronization and noise in time measurement systems

Author: Kinali-Dogan Attila
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

Time plays an important role in our modern lives. Especially having accurate time, which in turn depends on having clocks being synchronized to each other. This thesis is split into three distinct parts. The first part deals with the mathematical description of noise that is required to model clocks and electronics accurately. In particular we will address the problem that the generally used tools from signal theory fail for noise signals which are neither of finite energy nor periodic in nature. For this we will introduce a new function space based on the Pp-seminorm that is an extension of the Lp-norm for functions of potentially infinite energy but limited power. Using this new semi-norm we will modify the Fourier transform to work on signals from this P p-space. And last but not least, we will introduce, based on the above, a new mathematical model of noise that captures all the properties associated with 1/f -noise. In the second part, we will look at how noise propagates in a few classes of electronics, especially how the non-linear behavior of electronics leads to an amplification of noise and how it could be miti-gated. Lastly, in the third part we will look at one approach of fault-tolerant clock synchronization. After explaining its working principle and showing an implementation in an FPGA we will focus on meta-stability, the problems it can cause and how to handle them on two different circuit levels.Zeit spielt eine wichtige Rolle in unserem Leben. Insbesondere die Verfügbarkeit einer genauen Zeit. Welches wiederum davon abhängt, dass man Uhren hat die auf einander synchronisiert laufen. Diese Arbeit ist in drei Teile aufgeteilt: Im ersten Teil betrachten wir die mathematische Beschreibung von Rauschen um elektronische Systeme und Uhren korrekt beschreiben zu können. Im Besonderen betrachten wir die Probleme die die generell benutzten Methoden der Signalverarbeitung beim Umgang mit Rauschsignalen haben, die weder energiebegrenzt noch periodisch sind. Dafür erweitern wir den Funktionenraum der Lp-Norm auf leistungslimiterte Funktionene und führen die Pp-Halbnorm ein und modifizieren die Fouriertransformation zur Verwendung auf diesen Raum. Und letztlich führen wir ein neues mathematisches Model zur Beschreibung von Rauschen ein, welches alle üblicherweise angenommenen Eigenschaften gleichzeitig erfüllt. Im zweiten Teil analysieren wir wie sich einige Klassen von elektronischen Schaltungem im Bezug auf Rauschen verhalten. Insbesondere im Bezug auf das nicht-lineare Verhalten der elektronischen Elemente, welches zu einer Verstärkung des Rauschens führt. Im dritten Teil betrachten wir eine Möglichkeit um fehlertolerante Synchronization von Uhren zu erreichen. Nach einem Überblick über den verwendeten Algorithmus und wie dieser einem FPGA implementiert werden kann, schauen wir uns den Einfluss von Metastabilität an und wie dieser eingedämmt werden kann

Universaar

Acronym

MPG.PuRe

Online Timing Slack Measurement and its Application in Field-Programmable Gate Arrays

Author: Levine Joshua M.
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/02/2015
Field of study

Reliability, power consumption and timing performance are key concerns for today's integrated circuits. Measurement techniques capable of quantifying the timing characteristics of a circuit, while it is operating, facilitate a range of benefits. Delay variation due to environmental and operational conditions, and degradation can be monitored by tracking changes in timing performance. Using the measurements in a closed-loop to control power supply voltage or clock frequency allows for the reduction of timing safety margins, leading to improvements in power consumption or throughput performance through the exploitation of better-than worst-case operation. This thesis describes a novel online timing slack measurement method which can directly measure the timing performance of a circuit, accurately and with minimal overhead. Enhancements allow for the improvement of absolute accuracy and resolution. A compilation flow is reported that can automatically instrument arbitrary circuits on FPGAs with the measurement circuitry. On its own this measurement method is able to track the "health" of an integrated circuit, from commissioning through its lifetime, warning of impending failure or instigating pre-emptive degradation mitigation techniques. The use of the measurement method in a closed-loop dynamic voltage and frequency scaling scheme has been demonstrated, achieving significant improvements in power consumption and throughput performance.Open Acces

Spiral - Imperial College Digital Repository

An Application of Quantum Networks for Secure Video Surveillance

Author: Alan Mink
Barry Hershman
Lijun Ma
Xiao Tang
Publication venue: 'IntechOpen'
Publication date: 03/02/2011
Field of study

IntechOpen

Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems

Author: Subramanian Viswanathan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption. Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance. In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets. As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented. Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K. Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node. Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked

Digital Repository @ Iowa State University (ISU)