949 research outputs found
DFT for fast testing of self-timed control circuits
Journal ArticleIn this paper, we present a methodology to perform fast testing of the control path of self-timed circuits [91]. The speedup is achieved by testing all the execution paths in the control simultaneously. The circuits considered in this paper are those designed using an OCCAM based circuit compiler 121. This Compiler translates an OCCAM program description into an interconnection of pre-existing self-timed macro-modules (2, 10]. The method proposed involves modifying certain modules and structures in such a way that the circuits obtained by translation using these modified modules are testable in above mentioned way
Doctor of Philosophy
dissertationThe design of integrated circuit (IC) requires an exhaustive verification and a thorough test mechanism to ensure the functionality and robustness of the circuit. This dissertation employs the theory of relative timing that has the advantage of enabling designers to create designs that have significant power and performance over traditional clocked designs. Research has been carried out to enable the relative timing approach to be supported by commercial electronic design automation (EDA) tools. This allows asynchronous and sequential designs to be designed using commercial cad tools. However, two very significant holes in the flow exist: the lack of support for timing verification and manufacturing test. Relative timing (RT) utilizes circuit delay to enforce and measure event sequencing on circuit design. Asynchronous circuits can optimize power-performance product by adjusting the circuit timing. A thorough analysis on the timing characteristic of each and every timing path is required to ensure the robustness and correctness of RT designs. All timing paths have to conform to the circuit timing constraints. This dissertation addresses back-end design robustness by validating full cyclical path timing verification with static timing analysis and implementing design for testability (DFT). Circuit reliability and correctness are necessary aspects for the technology to become commercially ready. In this study, scan-chain, a commercial DFT implementation, is applied to burst-mode RT designs. In addition, a novel testing approach is developed along with scan-chain to over achieve 90% fault coverage on two fault models: stuck-at fault model and delay fault model. This work evaluates the cost of DFT and its coverage trade-off then determines the best implementation. Designs such as a 64-point fast Fourier transform (FFT) design, an I2C design, and a mixed-signal design are built to demonstrate power, area, performance advantages of the relative timing methodology and are used as a platform for developing the backend robustness. Results are verified by performing post-silicon timing validation and test. This work strengthens overall relative timed circuit flow, reliability, and testability
Test exploration and validation using transaction level models
The complexity of the test infrastructure and test strategies in systems-on-chip approaches the complexity of the functional design space. This paper presents test design space exploration and validation of test strategies and schedules using transaction level models (TLMs). Since many aspects of testing involve the transfer of a significant amount of test stimuli and responses, the communication-centric view of TLMs suits this purpose exceptionally wel
An asynchronous instruction length decoder
Journal ArticleAbstract-This paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium® Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II® 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25-CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions per nanosecond-with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400-MHz clocked circuit fabricated on the same process
Recommended from our members
Scalable algorithms for software based self test using formal methods
textTransistor scaling has kept up with Moore's law with a doubling of the number of transistors on a chip. More logic on a chip means more opportunities for manufacturing defects to slip in. This, in turn, has made processor testing after manufacturing a significant challenge. At-speed functional testing, being completely non-intrusive, has been seen as the ideal way of testing chips. However for processor testing, generating instruction level tests for covering all faults is a challenge given the issue of scalability. Data-path faults are relatively easier to control and observe compared to control-path faults. In this research we present a novel method to generate instruction level tests for hard to detect control-path faults in a processor. We initially map the gate level stuck-at fault to the Register Transfer Level (RTL) and build an equivalent faulty RTL model. The fault activation and propagation constraints are captured using Control and Data Flow Graphs of the RTL as a Liner Temporal Logic (LTL) property. This LTL property is then negated and given to a Bounded Model Checker based on a Bit-Vector Satisfiability Module Theories (SMT) solver. From the counter-example to the property we can extract a sequence of instructions that activates the gate level fault and propagates the fault effect to one of the observable points in the design. Other than the user supplying instruction constraints, this approach is completely automatic and does not require any manual intervention. Not all the design behaviors are required to generate a test for a fault. We use this insight to scale our previous methodology further. Underapproximations are design abstractions that only capture a subset of the original design behaviors. The use of RTL for test generation affords us two types of under-approximations: bit-width reduction and operator approximation. These are abstractions that perform reductions based on semantics of the RTL design. We also explore structural reductions of the RTL, called path based search, where we search through error propagation paths incrementally. This approach increases the size of the test generation problem step by step. In this way the SMT solver searches through the state space piecewise rather than doing the entire search at once. Experimental results show that our methods are robust and scalable for generating functional tests for hard to detect faults.Electrical and Computer Engineerin
Design of asynchronous microprocessor for power proportionality
PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s
law, while the costs involved in their design keep growing, also at an exponential rate.
The reason is the ever increasing complexity of processors, which modern EDA tools
struggle to keep up with. This makes further scaling for performance subject to a high
risk in the reliability of the system. To keep this risk low, yet improve the performance,
CPU designers try to optimise various parts of the processor. Instruction Set Architecture
(ISA) is a significant part of the whole processor design flow, whose optimal design
for a particular combination of available hardware resources and software requirements
is crucial for building processors with high performance and efficient energy utilisation.
This is a challenging task involving a lot of heuristics and high-level design decisions.
Another issue impacting CPU reliability is continuous scaling for power consumption. For
the last decades CPU designers have been mainly focused on improving performance, but
“keeping energy and power consumption in mind”. The consequence of this was a development
of energy-efficient systems, where energy was considered as a resource whose
consumption should be optimised. As CMOS technology was progressing, with feature
size decreasing and power delivered to circuit components becoming less stable, the
energy resource turned from an optimisation criterion into a constraint, sometimes a critical
one. At this point power proportionality becomes one of the most important aspects
in system design. Developing methods and techniques which will address the problem
of designing a power-proportional microprocessor, capable to adapt to varying operating
conditions (such as low or even unstable voltage levels) and application requirements in
the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed
by proposing a new design flow for the development of an ISA for microprocessors, which
can be altered to suit a particular hardware platform or a specific operating mode. This
flow uses an expressive and powerful formalism for the specification of processor instruction
sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures
large sets of behavioural scenarios for a microarchitectural level in a computationally
efficient form amenable to formal transformations for synthesis, verification and automated
derivation of asynchronous hardware for the CPU microcontrol. The feasibility of
the methodology, novel design flow and a number of optimisation techniques was proven
in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The
chip showed the ability to work in a wide range of operating voltage and environmental
conditions. Depending on application requirements and power budget our ASIC supports
several operating modes: one optimised for energy consumption and the other one for
performance. This was achieved by extending a traditional datapath structure with an
auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations
resulted in a reconfigurable and adaptable implementation, which was proven
by measurements, analysis and evaluation of the chip.EPSR
An asynchronous instruction length decoder
Journal ArticleThis paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium® Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II® 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25-CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions per nanosecond-with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400-MHz clocked circuit fabricated on the same process
Vernier Ring Based Pre-bond Through Silicon Vias Test in 3D ICs
Defects in TSV will lead to variations in the propagation delay of the net connected to the faulty TSV. A non-invasive Vernier Ring based method for TSV pre-bond testing is proposed to detect resistive open and leakage faults. TSVs are used as capacitive loads of their driving gates, then time interval compared with the fault-free TSVs will be detected. The time interval can be detected with picosecond level resolution, and digitized into a digital code to compare with an expected value of fault-free. Experiments on fault detection are presented through HSPICE simulations using realistic models for a 45 nm CMOS technology. The results show the effectiveness in the detection of time interval 10 ps, resistive open defects 0.2 kΩ above and equivalent leakage resistance less than 18 MΩ. Compared with existing methods, detection precision, area overhead, and test time are effectively improved, furthermore, the fault degree can be digitalized into digital code
RAPPID: an asynchronous instruction length decoder
Journal ArticleThis paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID ("Revolving Asynchronous Pentium® Processor Instruction Decoder"), a prototype IA32 instruction length decoding and steering unit, was implemented using self-timed techniques. RAPPID chip was fabricated on a 0.25m CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions/nS-with manageable risks using this design technology. RAPPID achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as an existing 400MHz clocked circuit
RAPPID: an asynchronous instruction length decoder
Journal ArticleThis paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID ("Revolving Asynchronous Pentium® Processor Instruction Decoder"), a prototype IA32 instruction length decoding and steering unit, was implemented using self-timed techniques. RAPPID chip was fabricated on a 0.25m CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions/nS-with manageable risks using this design technology. RAPPID achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as an existing 400MHz clocked circuit
- …