24 research outputs found

    Reconfigurable time interval measurement circuit incorporating a programmable gain time difference amplifier

    Get PDF
    PhD ThesisAs further advances are made in semiconductor manufacturing technology the performance of circuits is continuously increasing. Unfortunately, as the technology node descends deeper into the nanometre region, achieving the potential performance gain is becoming more of a challenge; due not only to the effects of process variation but also to the reduced timing margins between signals within the circuit creating timing problems. Production Standard Automatic Test Equipment (ATE) is incapable of performing internal timing measurements due, first to the lack of accessibility and second to the overall timing accuracy of the tester which is grossly inadequate. To address these issue ‘on-chip’ time measurement circuits have been developed in a similar way that built in self-test (BIST) evolved for ‘on-chip’ logic testing. This thesis describes the design and analysis of three time amplifier circuits. The analysis undertaken considers the operational aspects related to gain and input dynamic range, together with the robustness of the circuits to the effects of process, voltage and temperature (PVT) variations. The design which had the best overall performance was subsequently compared to a benchmark design, which used the ‘buffer delay offset’ technique for time amplification, and showed a marked 6.5 times improvement on the dynamic range extending this from 40 ps to 300ps. The new design was also more robust to the effects of PVT variations. The new time amplifier design was further developed to include an adjustable gain capability which could be varied in steps of approximately 7.5 from 4 to 117. The time amplifier was then connected to a 32-stage tapped delay line to create a reconfigurable time measurement circuit with an adjustable resolution range from 15 down to 0.5 ps and a dynamic range from 480 down to 16 ps depending upon the gain setting. The overall footprint of the measurement circuit, together with its calibration module occupies an area of 0.026 mm2 The final circuit, overall, satisfied the main design criteria for ‘on-chip’ time measurement circuitry, namely, it has a wide dynamic range, high resolution, robust to the effects of PVT and has a small area overhead.Umm Al-Qura University

    Solutions and application areas of flip-flop metastability

    Get PDF
    PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more metastable regions where the net attraction to the stable states can be infinitely-small. Flip-flops are among these systems and can take an unbounded amount of time to decide which logic state to settle to once they become metastable. This problematic behavior is often prevented by placing the setup and hold time conditions on the flip-flop’s input. However, in applications such as clock domain crossing where these constraints cannot be placed flip-flops can become metastable and induce catastrophic failures. These events are fundamentally impossible to prevent but their probability can be significantly reduced by employing synchronizer circuits. The latter grant flip-flops longer decision time at the expense of introducing latency in processing the synchronized input. This thesis presents a collection of research work involving the phenomenon of flip-flop metastability in digital systems. The main contributions include three novel solutions for the problem of synchronization. Two of these solutions are speculative methods that rely on duplicate state machines to pre-compute data-dependent states ahead of the completion of synchronization. Speculation is a core theme of this thesis and is investigated in terms of its functional correctness, cost efficacy and fitness for being automated by electronic design automation tools. It is shown that speculation can outperform conventional synchronization solutions in practical terms and is a viable option for future technologies. The third solution attempts to address the problem of synchronization in the more-specific context of variable supply voltages. Finally, the thesis also identifies a novel application of metastability as a means of quantifying intra-chip physical parameters. A digital sensor is proposed based on the sensitivity of metastable flip-flops to changes in their environmental parameters and is shown to have better precision while being more compact than conventional digital sensors

    Design of variation-tolerant synchronizers for multiple clock and voltage domains

    Get PDF
    PhD ThesisParametric variability increasingly affects the performance of electronic circuits as the fabrication technology has reached the level of 32nm and beyond. These parameters may include transistor Process parameters (such as threshold voltage), supply Voltage and Temperature (PVT), all of which could have a significant impact on the speed and power consumption of the circuit, particularly if the variations exceed the design margins. As systems are designed with more asynchronous protocols, there is a need for highly robust synchronizers and arbiters. These components are often used as interfaces between communication links of different timing domains as well as sampling devices for asynchronous inputs coming from external components. These applications have created a need for new robust designs of synchronizers and arbiters that can tolerate process, voltage and temperature variations. The aim of this study was to investigate how synchronizers and arbiters should be designed to tolerate parametric variations. All investigations focused mainly on circuit-level and transistor level designs and were modeled and simulated in the UMC90nm CMOS technology process. Analog simulations were used to measure timing parameters and power consumption along with a “Monte Carlo” statistical analysis to account for process variations. Two main components of synchronizers and arbiters were primarily investigated: flip-flop and mutual-exclusion element (MUTEX). Both components can violate the input timing conditions, setup and hold window times, which could cause metastability inside their bistable elements and possibly end in failures. The mean-time between failures is an important reliability feature of any synchronizer delay through the synchronizer. The MUTEX study focused on the classical circuit, in addition to a number of tolerance, based on increasing internal gain by adding current sources, reducing the capacitive loading, boosting the transconductance of the latch, compensating the existing Miller capacitance, and adding asymmetry to maneuver the metastable point. The results showed that some circuits had little or almost no improvements, while five techniques showed significant improvements by reducing τ and maintaining high tolerance. Three design approaches are proposed to provide variation-tolerant synchronizers. wagging synchronizer proposed to First, the is significantly increase reliability over that of the conventional two flip-flop synchronizer. The robustness of the wagging technique can be enhanced by using robust τ latches or adding one more cycle of synchronization. The second approach is the Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly detecting a metastable event and correcting it by enforcing the previously stored logic value. This technique significantly reduces the resolution time down from uncertain synchronization technique is proposed to transfer signals between Multiple- Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional level-shifters between the domains or multiple power supplies within each domain. This interface circuit uses a synchronous set and feedback reset protocol which provides level-shifting and synchronization of all signals between the domains, from a wide range of voltage-supplies and clock frequencies. Overall, synchronizer circuits can tolerate variations to a greater extent by employing the wagging technique or using a MADAC latch, while MUTEX tolerance can suffice with small circuit modifications. Communication between MVD/MCD can be achieved by an asynchronous handshake without a need for adding level-shifters.The Saudi Arabian Embassy in London, Umm Al-Qura University, Saudi Arabi

    A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing

    Get PDF
    This thesis discuss the design and the implementation of an FPGA-Based Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs (LQCD) and fluid-dynamics applications based on Lattice Boltzmann Methods (LBM). State-of-the-art programs in this (and other similar) applications have a large degree of available parallelism, that can be easily exploited on massively parallel systems, provided the underlying communication network has not only high-bandwidth but also low-latency. I have designed in details, built and tested in hardware, firmware and software an implementation of a Network Processor, tailored for the most recent families of multi-core processors. The implementation has been developed on an FPGA device to easily interface the logic of NWP with the CPU I/O sub-system. In this work I have assessed several ways to move data between the main memory of the CPU and the I/O sub-system to exploit high data throughput and low latency, enabling the use of “Programmed Input Output” (PIO), “Direct Memory Access” (DMA) and “Write Combining” memory-settings. On the software side, I developed and test a device driver for the Linux operating system to access the NWP device, as well as a system library to efficiently access the network device from user-applications. This thesis demonstrates the feasibility of a network infrastructure that saturates the maximum bandwidth of the I/O sub-systems available on recent CPUs, and reduces communication latencies to values very close to those needed by the processor to move data across the chip boundary

    Multi-resource approach to asynchronous SoC : design and tool support

    Get PDF
    As silicon cost reduces, the demands for higher performance and lower power consumption are ever increasing. The ability to dynamically control the number of resources employed can help balance and optimise a system in terms of its throughput, power consumption, and resilience to errors. The management of multiple resources requires building more advanced resource allocation logic than traditional 1-of-N arbiters posing the need for the efficient design flow supporting both the design and verification of such systems. Networks-on-Chip provide a good application example of distributed arbitration, in which the processor cores needing to transmit data are the clients; and the point-to-point links are the resources managed by routers. Building fast and smart arbiters can greatly benefit such systems in providing efficient and reliable communication service. In this thesis, a multi-resource arbiter was developed based on the Signal Transition Graph (STG) development flow. The arbiter distributes multiple active interchangeable resources that initiate requests when they are ready to be used. It supports concurrent resource utilization, which benefits creating asynchronous Multiple-Input-Multiple- Output (MIMO) queues. In order to deal with designs of higher complexity, an arbiter-oriented design flow is proposed. The flow is based on digital circuit components that are represented internally as STGs. This allows designing circuits without directly working with STGs but allowing their use for synthesis and formal verification. The interfaces for modelling, simulation, and visual model representation of the flow were implemented based on the existing modelling framework. As a result, the verification phase of the flow has helped to find hazards in existing Priority arbiter implementations. Finally, based on the logic-gate flow, the structure of a low-latency general purpose arbiter was developed. This design supports a wide variety of arbitration problems including the multi-resource management, which can benefit building NoCs employing complex and adaptive routing techniques.EThOS - Electronic Theses Online ServiceEPSRC grant GR/E044662/1 (STEP)GBUnited Kingdo

    Hazard-free clock synchronization

    Get PDF
    The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with a small clock skew. Hence, chips are split into multiple clock regions, each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on the implementation of clock synchronization algorithms. A major obstacle when implementing circuits on clock domain crossings are hazardous signals. We can formally define hazards by extending the Boolean logic by a third value u. In this thesis, we describe a theory for designing and analyzing hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them. In the end, we present two implementations of the GCS algorithm by Lenzen, Locher, and Wattenhofer (JACM 2010). We prove by rigorous analysis that the systems implement the algorithm. The theory described above is used to prove that our clock synchronization circuits are hazard-free (in the sense that they compute the most precise output possible). Simulation of our GCS system shows that it achieves a skew between neighboring clock regions that is smaller than a few inverter delays.Aufgrund der zunehmenden KomplexitĂ€t von Mikroprozessoren ist es unmöglich, mit einer einzigen Taktquelle den gesamten Prozessor ohne großen Versatz zu takten. Daher werden Chips in mehrere Regionen aufgeteilt, die jeweils von einer einzelnen Taktquelle abgedeckt werden. Dies stellt ein Problem fĂŒr die Kommunikation zwischen diesen Taktregionen dar. Algorithmen zur Taktsynchronisation bieten einen Vorteil gegenĂŒber aktuellen Lösungen, wie z.B. GALS-Systemen. Synchronisiert man die Taktregionen, so verbessert sich die Latenz der Kommunikation erheblich. In Schaltkreisen zwischen zwei Taktregionen können undefinierte Signale, sogenannte Hazards auftreten. Indem wir die boolesche Algebra um einen dritten Wert u erweitern, können wir diese Hazards formal definieren. In dieser Arbeit zeigen wir eine Methode zum Entwurf und zur Analyse von hazard-freien Schaltungen. Wir entwickeln Strategien fĂŒr Kodierungen die Hazards vermeiden und zur Konstruktion von hazard-freien Schaltungen. DarĂŒber hinaus stellen wir Algorithmen Taktsynchronisation vor und wie diese kombiniert werden können. Zum Schluss stellen wir zwei Implementierungen des GCS-Algorithmus von Lenzen, Locher und Wattenhofer (JACM 2010) vor. Oben genannte Mechanismen werden verwendet, um formal zu beweisen, dass diese Implementierungen korrekt sind. Die Implementierung hat keine Hazards, das heißt sie berechnet die bestmo ̈gliche Ausgabe. Anschließende Simulation der GCS Implementierung erzielt einen Versatz zwischen benachbarten Taktregionen, der kleiner als ein paar Gatter-Laufzeiten ist

    A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing

    Get PDF
    This thesis discuss the design and the implementation of an FPGA-Based Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs (LQCD) and fluid-dynamics applications based on Lattice Boltzmann Methods (LBM). State-of-the-art programs in this (and other similar) applications have a large degree of available parallelism, that can be easily exploited on massively parallel systems, provided the underlying communication network has not only high-bandwidth but also low-latency. I have designed in details, built and tested in hardware, firmware and software an implementation of a Network Processor, tailored for the most recent families of multi-core processors. The implementation has been developed on an FPGA device to easily interface the logic of NWP with the CPU I/O sub-system. In this work I have assessed several ways to move data between the main memory of the CPU and the I/O sub-system to exploit high data throughput and low latency, enabling the use of “Programmed Input Output” (PIO), “Direct Memory Access” (DMA) and “Write Combining” memory-settings. On the software side, I developed and test a device driver for the Linux operating system to access the NWP device, as well as a system library to efficiently access the network device from user-applications. This thesis demonstrates the feasibility of a network infrastructure that saturates the maximum bandwidth of the I/O sub-systems available on recent CPUs, and reduces communication latencies to values very close to those needed by the processor to move data across the chip boundary

    Jadis synchrones, désormais GALS, les architectures de FPGA

    Get PDF
    Il est de plus en plus difficile de rĂ©pondre Ă  la demande conflictuelle de circuits plus grands et plus rapides par les avancĂ©es seules des technologies des semi-conducteurs. À un certain point, on s'attend Ă  ce que les concepteurs et les fabricants doivent abandonner la mĂ©thodologie de conception synchrone traditionnelle pour une mĂ©thodologie localement synchrone globalement asynchrone (GALS). De tels changements engendrent plus de contraintes de synchronisation, mais Ă©galement plus de flexibilitĂ©. En consĂ©quence, une mĂ©thodologie pour l'implĂ©mentation de composants GALS sur FPGA synchrones traditionnels est d'abord prĂ©sentĂ©e. Les objecfifs sont de dĂ©finir un ensemble minimal de composants asynchrones de base, de permettre leur implĂ©mentation et d'Ă©tablir les contraintes et les limitations de tels circuits. Les rĂ©sultats de simulation confirment que des conceptions GALS implĂ©mentĂ©es Ă  l'aide de ressources du FPGA (tableau de correspondance et bascules) et des outils courants de placement et routage permettent l'implĂ©mentation de composants asynchrones tels que la ligne Ă  retard, l'Ă©lĂ©ment C de Muller et l'arbitre. Ces composants peuvent ĂȘtre implĂ©mentĂ©s dans des FPGA synchrones traditionnels tant que ces conceptions sont soumises Ă  des contraintes appropriĂ©es et qu'elles sont ufilisĂ©es en fonction des limitations du circuit. Pour atteindre de meilleures performances, une nouvelle architecture de FPGA compatible avec les dispositifs synchrones existants et qui soufient intrinsĂšquement les conceptions GALS est prĂ©sentĂ©e. L'objecfif principal est simple : l'architecture proposĂ©e doit apparaĂźtre inchangĂ©e pour les concepfions synchrones, mais doit inclure un ensemble minimal de composants de base pour empĂȘcher la mĂ©tastabilitĂ© lors de communicafions asynchrones. Les rĂ©sultats de simulation, d'un gĂ©nĂ©rateur d'horloge qui peut ĂȘtre arrĂȘtĂ©, sont prĂ©sentĂ©s. Tous ces rĂ©sultats dĂ©montrent qu'avec trĂšs peu de circuits adaptĂ©s, une cellule standard de FPGA peut devenir appropriĂ©e pour les mĂ©thodologies GALS. Un circuit de masquage des alĂ©as temporels est finalement prĂ©sentĂ© pour masquer la mĂ©tastabilitĂ© et les problĂšmes de synchronisafion. Le but est de dĂ©finir un circuit capable de mettre, physiquement, en application les contraintes qui masquent les sources de mĂ©tastabilitĂ© de façon Ă  ce que la synchronisafion paraisse transparente. Les rĂ©sultats de simulation confirment qu'un tel circuit peut masquer totalement toutes les sources de mĂ©tastabilitĂ© sans dĂ©gradafion des performances, mais avec une latence apparentĂ©e au temps nĂ©cessaire Ă  la stabilisation d'une bascule de mĂ©moire
    corecore