24 research outputs found
Reconfigurable time interval measurement circuit incorporating a programmable gain time difference amplifier
PhD ThesisAs further advances are made in semiconductor manufacturing technology the performance of circuits is continuously increasing. Unfortunately, as the technology node descends deeper into the nanometre region, achieving the potential performance gain is becoming more of a challenge; due not only to the effects of process variation but also to the reduced timing margins between signals within the circuit creating timing problems. Production Standard Automatic Test Equipment (ATE) is incapable of performing internal timing measurements due, first to the lack of accessibility and second to the overall timing accuracy of the tester which is grossly inadequate. To address these issue âon-chipâ time measurement circuits have been developed in a similar way that built in self-test (BIST) evolved for âon-chipâ logic testing.
This thesis describes the design and analysis of three time amplifier circuits. The analysis undertaken considers the operational aspects related to gain and input dynamic range, together with the robustness of the circuits to the effects of process, voltage and temperature (PVT) variations. The design which had the best overall performance was subsequently compared to a benchmark design, which used the âbuffer delay offsetâ technique for time amplification, and showed a marked 6.5 times improvement on the dynamic range extending this from 40 ps to 300ps. The new design was also more robust to the effects of PVT variations.
The new time amplifier design was further developed to include an adjustable gain capability which could be varied in steps of approximately 7.5 from 4 to 117. The time amplifier was then connected to a 32-stage tapped delay line to create a reconfigurable time measurement circuit with an adjustable resolution range from 15 down to 0.5 ps and a dynamic range from 480 down to 16 ps depending upon the gain setting. The overall footprint of the measurement circuit, together with its calibration module occupies an area of 0.026 mm2
The final circuit, overall, satisfied the main design criteria for âon-chipâ time measurement circuitry, namely, it has a wide dynamic range, high resolution, robust to the effects of PVT and has a small area overhead.Umm Al-Qura University
Recommended from our members
TIME-DIFFERENCE CIRCUITS: METHODOLOGY, DESIGN, AND DIGITAL REALIZATION
This thesis presents innovations for a special class of circuits called Time Difference (TD) circuits. We introduce a signal processing methodology with TD signals that alters the target signal from a magnitude perspective to time interval between two time events and systematically organizes the primary TD functions abstracted from existing TD circuits and systems. The TD circuits draw attention from a broad range of application fields. In addition, highly evolved complementary metal-oxide-semiconductor (CMOS) technology suffers from various problems related to voltage and current amplitude signal processing methods. Compared to traditional analog and digital circuits, TD circuits bring several compelling features: high-resolution, high-throughput, and low-design complexity with digital integration capability. Further, the fabrication technology is advancing into the nanometer regime; the reduction in voltage headroom limits the performance of traditional analog/mixed-signal designs. All-digital design of time-difference circuit needs to be stressed to adapt to the low-cost, low-power, and high-portability applications.
We focus on Time-to-Digital Converters (TDC), one of the crucial building blocks in TD circuits. A novel algorithmic architecture is proposed based on a binary search algorithm and validated with both simulation and fabricated silicon. An all-digital structure Time-difference Amplifier (TDA) is designed and implemented to make FPGA and other all-digital implementations for TDC and related TD circuits feasible. Besides, we propose an all-digital timing measurement circuit based on the process variation from CMOS fabrication: PVTMC, which achieves a high measurement resolution:
Solutions and application areas of flip-flop metastability
PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more
metastable regions where the net attraction to the stable states can be infinitely-small.
Flip-flops are among these systems and can take an unbounded amount of time to decide
which logic state to settle to once they become metastable. This problematic behavior is
often prevented by placing the setup and hold time conditions on the flip-flopâs input.
However, in applications such as clock domain crossing where these constraints cannot
be placed flip-flops can become metastable and induce catastrophic failures. These
events are fundamentally impossible to prevent but their probability can be significantly
reduced by employing synchronizer circuits. The latter grant flip-flops longer decision
time at the expense of introducing latency in processing the synchronized input.
This thesis presents a collection of research work involving the phenomenon of
flip-flop metastability in digital systems. The main contributions include three novel
solutions for the problem of synchronization. Two of these solutions are speculative
methods that rely on duplicate state machines to pre-compute data-dependent states
ahead of the completion of synchronization. Speculation is a core theme of this thesis
and is investigated in terms of its functional correctness, cost efficacy and fitness for
being automated by electronic design automation tools. It is shown that speculation
can outperform conventional synchronization solutions in practical terms and is a viable
option for future technologies. The third solution attempts to address the problem of
synchronization in the more-specific context of variable supply voltages. Finally, the
thesis also identifies a novel application of metastability as a means of quantifying
intra-chip physical parameters. A digital sensor is proposed based on the sensitivity
of metastable flip-flops to changes in their environmental parameters and is shown to
have better precision while being more compact than conventional digital sensors
Design of variation-tolerant synchronizers for multiple clock and voltage domains
PhD ThesisParametric variability increasingly affects the performance of electronic circuits as
the fabrication technology has reached the level of 32nm and beyond. These
parameters may include transistor Process parameters (such as threshold
voltage), supply Voltage and Temperature (PVT), all of which could have a
significant impact on the speed and power consumption of the circuit, particularly
if the variations exceed the design margins. As systems are designed with more
asynchronous protocols, there is a need for highly robust synchronizers and
arbiters. These components are often used as interfaces between communication
links of different timing domains as well as sampling devices for asynchronous
inputs coming from external components. These applications have created a need
for new robust designs of synchronizers and arbiters that can tolerate process,
voltage and temperature variations.
The aim of this study was to investigate how synchronizers and arbiters should be
designed to tolerate parametric variations. All investigations focused mainly on
circuit-level and transistor level designs and were modeled and simulated in the
UMC90nm CMOS technology process. Analog simulations were used to measure
timing parameters and power consumption along with a âMonte Carloâ statistical
analysis to account for process variations.
Two main components of synchronizers and arbiters were primarily investigated:
flip-flop and mutual-exclusion element (MUTEX). Both components can violate the
input timing conditions, setup and hold window times, which could cause
metastability inside their bistable elements and possibly end in failures. The
mean-time between failures is an important reliability feature of any synchronizer
delay through the synchronizer.
The MUTEX study focused on the classical circuit, in addition to a number of
tolerance, based on increasing internal gain by adding current sources, reducing
the capacitive loading, boosting the transconductance of the latch, compensating
the existing Miller capacitance, and adding asymmetry to maneuver the metastable
point. The results showed that some circuits had little or almost no improvements,
while five techniques showed significant improvements by reducing Ï and
maintaining high tolerance.
Three design approaches are proposed to provide variation-tolerant
synchronizers. wagging synchronizer proposed to First, the is significantly
increase reliability over that of the conventional two flip-flop synchronizer. The
robustness of the wagging technique can be enhanced by using robust Ï latches or
adding one more cycle of synchronization. The second approach is the
Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly
detecting a metastable event and correcting it by enforcing the previously stored
logic value. This technique significantly reduces the resolution time down from
uncertain
synchronization technique is proposed to transfer signals between Multiple-
Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional
level-shifters between the domains or multiple power supplies within each
domain. This interface circuit uses a synchronous set and feedback reset protocol
which provides level-shifting and synchronization of all signals between the
domains, from a wide range of voltage-supplies and clock frequencies.
Overall, synchronizer circuits can tolerate variations to a greater extent by
employing the wagging technique or using a MADAC latch, while MUTEX tolerance
can suffice with small circuit modifications. Communication between MVD/MCD
can be achieved by an asynchronous handshake
without a need for adding level-shifters.The Saudi Arabian Embassy in London,
Umm Al-Qura University, Saudi Arabi
A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing
This thesis discuss the design and the implementation of an FPGA-Based
Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs
(LQCD) and fluid-dynamics applications based on Lattice Boltzmann
Methods (LBM). State-of-the-art programs in this (and other similar)
applications have a large degree of available parallelism, that can be easily
exploited on massively parallel systems, provided the underlying communication
network has not only high-bandwidth but also low-latency.
I have designed in details, built and tested in hardware, firmware and
software an implementation of a Network Processor, tailored for the most
recent families of multi-core processors. The implementation has been developed
on an FPGA device to easily interface the logic of NWP with the CPU
I/O sub-system.
In this work I have assessed several ways to move data between the main
memory of the CPU and the I/O sub-system to exploit high data throughput
and low latency, enabling the use of âProgrammed Input Outputâ (PIO),
âDirect Memory Accessâ (DMA) and âWrite Combiningâ memory-settings.
On the software side, I developed and test a device driver for the Linux
operating system to access the NWP device, as well as a system library to
efficiently access the network device from user-applications.
This thesis demonstrates the feasibility of a network infrastructure that
saturates the maximum bandwidth of the I/O sub-systems available on recent
CPUs, and reduces communication latencies to values very close to those
needed by the processor to move data across the chip boundary
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for todayâs many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the networkâs switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each outputâs own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Multi-resource approach to asynchronous SoC : design and tool support
As silicon cost reduces, the demands for higher performance and lower power consumption are ever increasing. The ability to dynamically control the number of resources employed can help balance and optimise a system in terms of its throughput, power consumption, and resilience to errors. The management of multiple resources requires building more advanced resource allocation logic than traditional 1-of-N arbiters posing the need for the efficient design flow supporting both the design and verification of such systems. Networks-on-Chip provide a good application example of distributed arbitration, in which the processor cores needing to transmit data are the clients; and the point-to-point links are the resources managed by routers. Building fast and smart arbiters can greatly benefit such systems in providing efficient and reliable communication service. In this thesis, a multi-resource arbiter was developed based on the Signal Transition Graph (STG) development flow. The arbiter distributes multiple active interchangeable resources that initiate requests when they are ready to be used. It supports concurrent resource utilization, which benefits creating asynchronous Multiple-Input-Multiple- Output (MIMO) queues. In order to deal with designs of higher complexity, an arbiter-oriented design flow is proposed. The flow is based on digital circuit components that are represented internally as STGs. This allows designing circuits without directly working with STGs but allowing their use for synthesis and formal verification. The interfaces for modelling, simulation, and visual model representation of the flow were implemented based on the existing modelling framework. As a result, the verification phase of the flow has helped to find hazards in existing Priority arbiter implementations. Finally, based on the logic-gate flow, the structure of a low-latency general purpose arbiter was developed. This design supports a wide variety of arbitration problems including the multi-resource management, which can benefit building NoCs employing complex and adaptive routing techniques.EThOS - Electronic Theses Online ServiceEPSRC grant GR/E044662/1 (STEP)GBUnited Kingdo
Hazard-free clock synchronization
The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with a small clock skew. Hence, chips are split into multiple clock regions, each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on the implementation of clock synchronization algorithms. A major obstacle when implementing circuits on clock domain crossings are hazardous signals. We can formally define hazards by extending the Boolean logic by a third value u. In this thesis, we describe a theory for designing and analyzing hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them. In the end, we present two implementations of the GCS algorithm by Lenzen, Locher, and Wattenhofer (JACM 2010). We prove by rigorous analysis that the systems implement the algorithm. The theory described above is used to prove that our clock synchronization circuits are hazard-free (in the sense that they compute the most precise output possible). Simulation of our GCS system shows that it achieves a skew between neighboring clock regions that is smaller than a few inverter delays.Aufgrund der zunehmenden KomplexitĂ€t von Mikroprozessoren ist es unmöglich, mit einer einzigen Taktquelle den gesamten Prozessor ohne groĂen Versatz zu takten. Daher werden Chips in mehrere Regionen aufgeteilt, die jeweils von einer einzelnen Taktquelle abgedeckt werden. Dies stellt ein Problem fĂŒr die Kommunikation zwischen diesen Taktregionen dar. Algorithmen zur Taktsynchronisation bieten einen Vorteil gegenĂŒber aktuellen Lösungen, wie z.B. GALS-Systemen. Synchronisiert man die Taktregionen, so verbessert sich die Latenz der Kommunikation erheblich. In Schaltkreisen zwischen zwei Taktregionen können undefinierte Signale, sogenannte Hazards auftreten. Indem wir die boolesche Algebra um einen dritten Wert u erweitern, können wir diese Hazards formal definieren. In dieser Arbeit zeigen wir eine Methode zum Entwurf und zur Analyse von hazard-freien Schaltungen. Wir entwickeln Strategien fĂŒr Kodierungen die Hazards vermeiden und zur Konstruktion von hazard-freien Schaltungen. DarĂŒber hinaus stellen wir Algorithmen Taktsynchronisation vor und wie diese kombiniert werden können. Zum Schluss stellen wir zwei Implementierungen des GCS-Algorithmus von Lenzen, Locher und Wattenhofer (JACM 2010) vor. Oben genannte Mechanismen werden verwendet, um formal zu beweisen, dass diese Implementierungen korrekt sind. Die Implementierung hat keine Hazards, das heiĂt sie berechnet die bestmo Ìgliche Ausgabe. AnschlieĂende Simulation der GCS Implementierung erzielt einen Versatz zwischen benachbarten Taktregionen, der kleiner als ein paar Gatter-Laufzeiten ist
A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing
This thesis discuss the design and the implementation of an FPGA-Based
Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs
(LQCD) and fluid-dynamics applications based on Lattice Boltzmann
Methods (LBM). State-of-the-art programs in this (and other similar)
applications have a large degree of available parallelism, that can be easily
exploited on massively parallel systems, provided the underlying communication
network has not only high-bandwidth but also low-latency.
I have designed in details, built and tested in hardware, firmware and
software an implementation of a Network Processor, tailored for the most
recent families of multi-core processors. The implementation has been developed
on an FPGA device to easily interface the logic of NWP with the CPU
I/O sub-system.
In this work I have assessed several ways to move data between the main
memory of the CPU and the I/O sub-system to exploit high data throughput
and low latency, enabling the use of âProgrammed Input Outputâ (PIO),
âDirect Memory Accessâ (DMA) and âWrite Combiningâ memory-settings.
On the software side, I developed and test a device driver for the Linux
operating system to access the NWP device, as well as a system library to
efficiently access the network device from user-applications.
This thesis demonstrates the feasibility of a network infrastructure that
saturates the maximum bandwidth of the I/O sub-systems available on recent
CPUs, and reduces communication latencies to values very close to those
needed by the processor to move data across the chip boundary
Jadis synchrones, désormais GALS, les architectures de FPGA
Il est de plus en plus difficile de répondre à la demande conflictuelle de circuits plus grands et plus rapides par les avancées seules des technologies des semi-conducteurs. à un certain point, on s'attend à ce que les concepteurs et les fabricants doivent abandonner la méthodologie de conception synchrone traditionnelle pour une méthodologie localement synchrone globalement asynchrone (GALS). De tels changements engendrent plus de contraintes de synchronisation, mais également plus de flexibilité.
En consĂ©quence, une mĂ©thodologie pour l'implĂ©mentation de composants GALS sur FPGA synchrones traditionnels est d'abord prĂ©sentĂ©e. Les objecfifs sont de dĂ©finir un ensemble minimal de composants asynchrones de base, de permettre leur implĂ©mentation et d'Ă©tablir les contraintes et les limitations de tels circuits. Les rĂ©sultats de simulation confirment que des conceptions GALS implĂ©mentĂ©es Ă l'aide de ressources du FPGA (tableau de correspondance et bascules) et des outils courants de placement et routage permettent l'implĂ©mentation de composants asynchrones tels que la ligne Ă retard, l'Ă©lĂ©ment C de Muller et l'arbitre. Ces composants peuvent ĂȘtre implĂ©mentĂ©s dans des FPGA synchrones traditionnels tant que ces conceptions sont soumises Ă des contraintes appropriĂ©es et qu'elles sont ufilisĂ©es en fonction des limitations du circuit.
Pour atteindre de meilleures performances, une nouvelle architecture de FPGA compatible avec les dispositifs synchrones existants et qui soufient intrinsĂšquement les conceptions GALS est prĂ©sentĂ©e. L'objecfif principal est simple : l'architecture proposĂ©e doit apparaĂźtre inchangĂ©e pour les concepfions synchrones, mais doit inclure un ensemble minimal de composants de base pour empĂȘcher la mĂ©tastabilitĂ© lors de communicafions asynchrones. Les rĂ©sultats de simulation, d'un gĂ©nĂ©rateur d'horloge qui peut ĂȘtre arrĂȘtĂ©, sont prĂ©sentĂ©s. Tous ces rĂ©sultats dĂ©montrent qu'avec trĂšs peu de circuits adaptĂ©s, une cellule standard de FPGA peut devenir appropriĂ©e pour les mĂ©thodologies GALS.
Un circuit de masquage des aléas temporels est finalement présenté pour masquer la métastabilité et les problÚmes de synchronisafion. Le but est de définir un circuit capable de mettre, physiquement, en application les contraintes qui masquent les sources de métastabilité de façon à ce que la synchronisafion paraisse transparente. Les résultats de simulation confirment qu'un tel circuit peut masquer totalement toutes les sources de métastabilité sans dégradafion des performances, mais avec une latence apparentée au temps nécessaire à la stabilisation d'une bascule de mémoire