24 research outputs found
Recommended from our members
bBlocks : a flexible object oriented microarchitecture simulation framework
This research develops an object-oriented approach of modeling microprocessor architecture. A generic modeling library, bBlocks, is proposed as a framework for constructing microprocessor simulation. bBlocks is a collection of predefined abstract components (blocks) implemented in Java, the object-oriented programming language. Blocks are defined and used as the basic components in a variety of microprocessor operation modeling, simulations, and optimizations in bBlocks. The basic activities of a microprocessor are encapsulated into blocks. By using standard interfaces, blocks are integrated into the proposed modeling framework. Unlike traditional software simulators, this framework is cycle based, concurrent capable simulation, which may take advantage of multi-threading or parallel computation techniques. Using an object-oriented approach, bBlocks adapts an open system policy, which gives end users the flexibility of incorporating user-defined components into end models and integrating the user-preferred architecture. Another purpose of this research is to provide a mechanism to model and study future microprocessors. bBlocks is implemented in Java, a truly cross-platfonn object-oriented language. It is conceivable that bBlocks could eventually run on any machine, regardless of the architecture or operating system as long as it is running the Java Virtual Machine. In addition to this, with Java GUI tools, bBlocks provides probes to investigate the activities inside the blocks. To illustrate the effectiveness of bBlocks, two architectures, SuperScalar and CDF, have been implemented
Elastic circuits
Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version
Adaptive Distributed Architectures for Future Semiconductor Technologies.
Year after year semiconductor manufacturing has been able to integrate more components in a single computer chip. These improvements have been possible through systematic shrinking in the size of its basic computational element, the transistor. This trend has allowed computers to progressively become faster, more efficient and less expensive. As this trend continues, experts foresee that current computer designs will face new challenges, in utilizing the minuscule devices made available by future semiconductor technologies. Today's microprocessor designs are not fit to overcome these challenges, since they are constrained by their inability to handle component failures by their lack of adaptability to a wide range of custom modules optimized for specific applications and by their limited design modularity.
The focus of this thesis is to develop original computer architectures, that can not only survive these new challenges, but also leverage the vast number of transistors available to unlock better performance and efficiency. The work explores and evaluates new software and hardware techniques to enable the development of novel adaptive and modular computer designs. The thesis first explores an infrastructure to quantitatively assess the fallacies of current systems and their inadequacy to operate on unreliable silicon. In light of these findings, specific solutions are then proposed to strengthen digital system architectures, both through hardware and software techniques. The thesis culminates with the proposal of a radically new architecture design that can fully adapt dynamically to operate on the hardware resources available on chip, however limited or abundant those may be.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102405/1/apellegr_1.pd
Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems
Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption.
Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance.
In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets.
As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented.
Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K.
Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node.
Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked
Programmable stochastic processors
As traditional approaches for reducing power in microprocessors are being exhausted, extreme power challenges call for unconventional approaches to power reduction. Recent research has shown substantial promise for application-specific stochastic computing, i.e., computing that exploits application error tolerance to enable careful relaxation of correctness guarantees provided by hardware in order to reduce power. This dissertation explores the feasibility, challenges, and potential benefits of stochastic computing in the context of programmable general purpose processors. Specifically, the dissertation describes design-level techniques that minimize the power of a processor for a non-zero error rate or allow a processor to fail gracefully when operated over a range of non-zero error rates. It presents microarchitectural design principles that allow a processor to trade off reliability and energy more efficiently to minimize energy when exploiting error resilience. It demonstrates the benefit of using compiler optimizations that optimize a binary to enable more energy savings when operating at a non-zero error rate. It also demonstrates significant benefits for a programmable stochastic processor prototype that improves energy efficiency by carefully relaxing correctness and exposing errors in applications running on a commodity processor. This dissertation on programmable stochastic processors conclusively shows that the architecture and design of processors and applications should be approached differently in scenarios where errors are allowed to be exposed from the hardware to higher levels of the compute stack. Significant energy benefits are demonstrated for design-, architecture-, compiler-, and application-level optimizations for general purpose programmable stochastic processors
Recommended from our members
Methods to improve the reliability and resiliency of near/sub-threshold digital circuits
Energy consumption is one of the primary bottlenecks to both large and small scale modern compute platforms. Reducing the operating voltage of digital circuits to voltages where the supply voltage is near or below the threshold of the transistors has recently gained attention as a method to reduce the energy required for computations by as much as 6 times. However, when operating at near/sub-threshold voltages (where the supply voltage is near or below the threshold of the transistors), imperfections in transistor manufacturing, changes in temperature, and other difficult-to-predict factors cause wide variations in the timing of Complementary Metal-Oxide Semiconductor (CMOS) circuits due to an increased sensitivity at lower voltages. These increased variations result in poor aggregate performance and cause increased rates of error occurrence in computation.
This work introduces several new methods to improve the reliability of near/sub-threshold circuits. The first is a design automation technique that is used to aid in low-voltage digital standard cell synthesis. Second, two circuit-level techniques are also introduced that aim to improve the reliability and resiliency of digital circuits by means of completion/error detection. These techniques are shown to improve speed and lower energy consumption at low overheads compared to previous methods. Most importantly, these circuit-level methods are specifically designed to operate at low voltages and can themselves tolerate variations and operation in harsh environments. Finally, a test-chip prototype designed in 65nm-CMOS demonstrates the practicality and feasibility of a proposed current sensing error detector
Recommended from our members
A hierarchical approach to formal modeling and verification of asynchronous circuits
The self-timed (or asynchronous) approach to
circuit design has demonstrated benefits in a number of different
areas for its low energy consumption, high operating speed,
composability, and modularity. Nonetheless, the asynchronous paradigm
exposes challenges that are not found in the synchronous (or
clock-driven) paradigm. For the verification task, a challenge
emerges from the large number of potential operational interleavings
exhibited in the asynchronous paradigm. Simply exploring all
interleavings is, in general, intractable because the number of
interleavings can grow exponentially.
This dissertation focuses on developing scalable methods that are
capable of reasoning effectively about the interleaving problem
exhibited in self-timed systems. We specify and verify
finite-state-machine representations of self-timed circuit designs
using the DE system, a formal hardware description language defined
using the ACL2 theorem-proving system. We apply a link-joint paradigm
to model self-timed circuits as networks of channels that communicate
with each other locally via handshake protocols. This link-joint
model has been shown to be a universal model for various self-timed
circuit families. In addition, this model has a clean formalization
in the ACL2 logic and provides a protocol level that abstracts away
timing constraints at the circuit level.
Unlike many efforts for validating timing and communication properties
of self-timed systems, we are interested in verifying functional
properties. Specifically, we verify the functional correctness of
self-timed systems in terms of relationships between their input and
output sequences. To mitigate the consideration of all interleavings
simultaneously, we address the verification problem hierarchically and
avoid exploring the internal structures of verified submodules as well
as their operational interleavings. The input-output relationship of
a verified submodule is determined based on the communication signals
at the submodule's input and output ports, while abstracting away all
execution paths internal to that submodule.Computer Science
Interpreted graph models
A model class called an Interpreted Graph Model (IGM) is defined. This class includes a large number of graph-based models that are used in asynchronous circuit design and other applications of concurrecy. The defining characteristic of this model class is an underlying static graph-like structure where behavioural semantics are attached using additional entities, such as tokens or node/arc states. The similarities in notation and expressive power allow a number of operations on these formalisms, such as visualisation, interactive simulation, serialisation, schematic entry and model conversion to be generalised. A software framework called Workcraft was developed to take advantage of these properties of IGMs. Workcraft provides an environment for rapid prototyping of graph-like models and related tools. It provides a large set of standardised functions that considerably facilitate the task of providing tool support for any IGM. The concept of Interpreted Graph Models is the result of research on methods of application of lower level models, such as Petri nets, as a back-end for simulation and verification of higher level models that are more easily manipulated. The goal is to achieve a high degree of automation of this process. In particular, a method for verification of speed-independence of asynchronous circuits is presented. Using this method, the circuit is specified as a gate netlist and its environment is specified as a Signal Transition Graph. The circuit is then automatically translated into a behaviourally equivalent Petri net model. This model is then composed with the specification of the environment. A number of important properties can be established on this compound model, such as the absence of deadlocks and hazards. If a trace is found that violates the required property, it is automatically interpreted in terms of switching of the gates in the original gate-level circuit specification and may be presented visually to the circuit designer. A similar technique is also used for the verification of a model called Static Data Flow Structure (SDFS). This high level model describes the behaviour of an asynchronous data path. SDFS is particularly interesting because it models complex behaviours such as preemption, early evaluation and speculation. Preemption is a technique which allows to destroy data objects in a computation pipeline if the result of computation is no longer needed, reducing the power consumption. Early evaluation allows a circuit to compute the output using a subset of its inputs and preempting the inputs which are not needed. In speculation, all conflicting branches of computation run concurrently without waiting for the selecting condition; once the selecting condition is computed the unneeded branches are preempted. The automated Petri net based verification technique is especially useful in this case because of the complex nature of these features. As a result of this work, a number of cases are presented where the concept of IGMs and the Workcraft tool were instrumental. These include the design of two different types of arbiter circuits, the design and debugging of the SDFS model, synthesis of asynchronous circuits from the Conditional Partial Order Graph model and the modification of the workflow of Balsa asynchronous circuit synthesis system.EThOS - Electronic Theses Online ServiceEPSRCGBUnited Kingdo