86 research outputs found
Doctor of Philosophy
dissertationElasticity is a design paradigm in which circuits can tolerate arbitrary latency/delay variations in their computation units as well as communication channels. Creating elastic (both synchronous and asynchronous) designs from clocked designs has potential benefits of increased modularity and robustness to variations. Several transformations have been suggested in the literature and each of these require a handshake control network (examples include synchronous elasticization and desynchronization). Elastic control network area and power overheads may become prohibitive. This dissertation investigates different optimization avenues to reduce these overheads without sacrificing the control network performance. First, an algorithm and a tool, CNG, is introduced that generates a control network with minimal total number of join and fork control steering units. Synchronous Elastic FLow (SELF) is a handshake protocol used over synchronous elastic designs. Comparing to its standard eager implementation (that uses eager forks - EForks), lazy SELF can consume less power and area. However, it typically suff ers from combinational cycles and can have inferior performance in some systems. Hence, lazy SELF has been rarely studied in the literature. This work formally and exhaustively investigates the specifi cations, diff erent implementations, and verifi cation of the lazy SELF protocol. Furthermore, several new and existing lazy designs are mapped to hybrid eager/lazy imple-mentations that retain the performance advantage of the eager design but have power and area advantages of lazy implementations, and are combinational-cycle free. This work also introduces a novel ultra simple fork (USFork) design. The USFork has two advantages over lazy forks: it is composed of simpler logic (just wires) and does not form combinational cycles. The conditions under which an EFork can be replaced by a USFork without any performance loss are formally derived. The last optimization avenue discussed in this dissertation is Elastic Bu er Controller (EBC) merging. In a typical synchronous elastic control network, some EBCs may activate their corresponding latches at similar schedules. This work provides a framework for fi nding and merging such controllers in any control network; including open networks (i.e., when the environment abstract is not available or required to be flexible) as well as networks incorporating variable latency units. Replacing EForks with USForks under some equivalence conditions as well as EBC merging have been fully automated in a tool, HGEN. The impact of this work will help achieve elasticity at a reduced cost. It will broaden the class of circuits that can be elasticized with acceptable overhead (circuits that designers would otherwise nd it too expensive to elasticize). In a MiniMIPS processor case study, comparing to a basic control network implementation, the optimization techniques of this dissertation accumulatively achieve reductions in the control network area, dynamic, and leakage power of 73.2%, 68.6%, and 69.1%, respectively
On the Semantics of Communicating Hardware Processes and their Translation into LOTOS for the Verification of Asynchronous Circuits with CADP
International audienceHardware process calculi, such as CHP (Communicating Hardware Processes), Balsa, or Haste (formerly Tangram), are a natural approach for the description of asynchronous hardware architectures. These calculi are extensions of standard process calculi with particular synchronisation features implemented using handshake protocols. In this article, we first give a structural operational semantics for value-passing CHP. Compared to the existing semantics of CHP defined by translation into Petri nets, our semantics is general enough to handle value-passing CHP with communication channels open to the environment, and is also independent of any particular (2- or 4-phase) handshake protocol used for circuit implementation. We then describe the translation of CHP into the process calculus LOTOS (ISO standard 8807), in order to allow asynchronous hardware architectures expressed in CHP to be verified using the CADP verification toolbox for LOTOS. A translator from CHP to LOTOS has been implemented and successfully used for the compositional verification of two industrial case studies, namely an asynchronous implementation of the DES (Data Encryption Standard) and an asynchronous interconnect of a NoC (Network on Chip)
Test Quality Analysis and Improvement for an Embedded Asynchronous FIFO
Embedded First-InFirst-Out (FIFO) memories are increasingly used in many IC designs.We have created a new full-custom embedded FIFO module withasynchronous read and write clocks, which is at least a factor twosmaller and also faster than SRAM-based and standard-cell-basedcounterparts. The detection qualities of the FIFO test for bothhard and weak resistive shorts and opens have been analyzed by anIFA-like method based on analog simulation. The defect coverage ofthe initial FIFO test for shorts in the bit-cell matrix has beenimproved by inclusion of an additional data background andlow-voltage testing; for low-resistant shorts, 100% defect coverageis obtained. The defect coverage for opens has been improved by anew test procedure which includes waitingperiods
Low power predictable memory and processing architectures
Great demand in power optimized devices shows promising economic potential and draws lots of attention in industry and research area. Due to the continuously shrinking CMOS process, not only dynamic power but also static power has emerged as a big concern in power reduction. Other than power optimization, average-case power estimation is quite significant for power budget allocation but also challenging in terms of time and effort. In this thesis, we will introduce a methodology to support modular quantitative analysis in order to estimate average power of circuits, on the basis of two concepts named Random Bag Preserving and Linear Compositionality. It can shorten simulation time and sustain high accuracy, resulting in increasing the feasibility of power estimation of big systems. For power saving, firstly, we take advantages of the low power characteristic of adiabatic logic and asynchronous logic to achieve ultra-low dynamic and static power. We will propose two memory cells, which could run in adiabatic and non-adiabatic mode. About 90% dynamic power can be saved in adiabatic mode when compared to other up-to-date designs. About 90% leakage power is saved. Secondly, a novel logic, named Asynchronous Charge Sharing Logic (ACSL), will be introduced. The realization of completion detection is simplified considerably. Not just the power reduction improvement, ACSL brings another promising feature in average power estimation called data-independency where this characteristic would make power estimation effortless and be meaningful for modular quantitative average case analysis. Finally, a new asynchronous Arithmetic Logic Unit (ALU) with a ripple carry adder implemented using the logically reversible/bidirectional characteristic exhibiting ultra-low power dissipation with sub-threshold region operating point will be presented. The proposed adder is able to operate multi-functionally
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Recommended from our members
Methods to improve the reliability and resiliency of near/sub-threshold digital circuits
Energy consumption is one of the primary bottlenecks to both large and small scale modern compute platforms. Reducing the operating voltage of digital circuits to voltages where the supply voltage is near or below the threshold of the transistors has recently gained attention as a method to reduce the energy required for computations by as much as 6 times. However, when operating at near/sub-threshold voltages (where the supply voltage is near or below the threshold of the transistors), imperfections in transistor manufacturing, changes in temperature, and other difficult-to-predict factors cause wide variations in the timing of Complementary Metal-Oxide Semiconductor (CMOS) circuits due to an increased sensitivity at lower voltages. These increased variations result in poor aggregate performance and cause increased rates of error occurrence in computation.
This work introduces several new methods to improve the reliability of near/sub-threshold circuits. The first is a design automation technique that is used to aid in low-voltage digital standard cell synthesis. Second, two circuit-level techniques are also introduced that aim to improve the reliability and resiliency of digital circuits by means of completion/error detection. These techniques are shown to improve speed and lower energy consumption at low overheads compared to previous methods. Most importantly, these circuit-level methods are specifically designed to operate at low voltages and can themselves tolerate variations and operation in harsh environments. Finally, a test-chip prototype designed in 65nm-CMOS demonstrates the practicality and feasibility of a proposed current sensing error detector
Recommended from our members
System Design and Implementation for Hybrid Network Function Virtualization
With the application of virtualization technology in computer networks, many new research areas and techniques have been explored, such as network function virtualization (NFV). A significant benefit of virtualization is that it reduces the cost of a network system and increases its flexibility. Due to the increasing complexity of the network environment and constantly improving network scale and bandwidth, it is imperative to aim for higher performance, extensibility, and flexibility in the future network systems. In this dissertation, hybrid NFV platforms applying virtualization technology are proposed. We further explore the techniques used to improve the performance, scalability and resilience of these systems.
In the first part of this dissertation, we describe a new heterogeneous hardware-software NFV platform that provides scalability and programmability while supporting significant hardware-level parallelism and reconfiguration. Our computing platform takes advantage of both field-programmable gate arrays (FPGAs) and microprocessors to implement numerous virtual network functions (VNFs) that can be dynamically customized to specific network flow needs. Traffic management and hardware reconfiguration functions are performed by a global coordinator which allows for the rapid sharing of network function states and continuous evaluation of network function needs. With the help of state sharing mechanism offered by the coordinator, customer-defined VNF instances can be easily migrated between heterogeneous middleboxes as the network environment changes. A resource allocation algorithm dynamically assesses resource deployments as network flows and conditions are updated.
In the second part of this thesis document, we explore a new session-level approach for NFV that implements distributed agents in heterogeneous middleboxes to steer packets belonging to different sessions through session-specific service chains. Our session-level approach supports inter-domain service chaining with both FPGA- and processor-based middleboxes, dynamic reconfiguration of service chains for ongoing sessions, and the application of session-level approaches for UDP-based protocols. To demonstrate our approach, we establish inter-domain service chains for QUIC sessions, and reconfigure the service chains across a range of FPGA- and processor-based middleboxes. We show that our session-level approach can successfully reconfigure service chains for individual QUIC sessions. Compared with software implementations, the distributed agents implemented on FPGAs show better performance in various test scenarios
Microspacecraft and Earth observation: Electrical field (ELF) measurement project
The Utah State University space system design project for 1989 to 1990 focuses on the design of a global electrical field sensing system to be deployed in a constellation of microspacecraft. The design includes the selection of the sensor and the design of the spacecraft, the sensor support subsystems, the launch vehicle interface structure, on board data storage and communications subsystems, and associated ground receiving stations. Optimization of satellite orbits and spacecraft attitude are critical to the overall mapping of the electrical field and, thus, are also included in the project. The spacecraft design incorporates a deployable sensor array (5 m booms) into a spinning oblate platform. Data is taken every 0.1 seconds by the electrical field sensors and stored on-board. An omni-directional antenna communicates with a ground station twice per day to down link the stored data. Wrap-around solar cells cover the exterior of the spacecraft to generate power. Nine Pegasus launches may be used to deploy fifty such satellites to orbits with inclinations greater than 45 deg. Piggyback deployment from other launch vehicles such as the DELTA 2 is also examined
Minimizing and exploiting leakage in VLSI
Power consumption of VLSI (Very Large Scale Integrated) circuits has been growing at
an alarmingly rapid rate. This increase in power consumption, coupled with the increasing
demand for portable/hand-held electronics, has made power consumption a dominant
concern in the design of VLSI circuits today. Traditionally dynamic (switching) power has
dominated the total power consumption of VLSI circuits. However, due to process scaling
trends, leakage power has now become a major component of the total power consumption
in VLSI circuits. This dissertation explores techniques to reduce leakage, as well as
techniques to exploit leakage currents through the use of sub-threshold circuits.
This dissertation consists of two studies. In the first study, techniques to reduce leakage
are presented. These include a low leakage ASIC design methodology that uses high
VT sleep transistors selectively, a methodology that combines input vector control and circuit
modification, and a scheme to find the optimum reverse body bias voltage to minimize
leakage.
As the minimum feature size of VLSI fabrication processes continues to shrink with
each successive process generation (along with the value of supply voltage and therefore the
threshold voltage of the devices), leakage currents increase exponentially. Leakage currents
are hence seen as a necessary evil in traditional VLSI design methodologies. We present
an approach to turn this problem into an opportunity. In the second study in this dissertation,
we attempt to exploit leakage currents to perform computation. We use sub-threshold
digital circuits and come up with ways to get around some of the pitfalls associated with sub-threshold circuit design. These include a technique that uses body biasing adaptively
to compensate for Process, Voltage and Temperature (PVT) variations, a design approach
that uses asynchronous micro-pipelined Network of Programmable Logic Arrays (NPLAs)
to help improve the throughput of sub-threshold designs, and a method to find the optimum
supply voltage that minimizes energy consumption in a circuit
Design and Implementation of a High-Speed Readout and Control System for a Digital Tracking Calorimeter for proton CT
Particle therapy, a non-invasive technique for treating cancer using protons and light ions, has become more and more common. For example, a particle treatment facility is currently being built, in Bergen, Norway. Proton beams deposit a large fraction of their energy at the end of their paths, i.e., the delivered dose can be focused on the tumor, sparing nearby tissue with a low entry and almost no exit dose. A novel imaging modality using protons promises to overcome some limitations of particle therapy and allowing to fully exploit its potential. Being able to position the so-called Bragg peak accurately inside the tumor is a major advantage of charged particles, but incomplete knowledge about a crucial tissue property, the stopping power, limits its precision.
A proton CT scanner provides direct information about the stopping power. It has the potential to reduce range uncertainties significantly, but no proton CT system has yet been shown to be suitable for clinical use. The aim of the Bergen proton CT project is to design and build a proton CT scanner that overcomes most of the critical limitations of the currently existing prototypes and which can be operated in clinical settings. A proton CT prototype, the Digital Tracking Calorimeter, is being developed as a range telescope consisting of high-granularity pixel sensors. The prototype is a combined position-sensitive detector and residual energy-range detector which will allow a substantial rate of protons, speeding up the imaging process.
The detector is single-sided, meaning that it employs information from the beam delivery system to omit tracker layers in front of the phantom. The detector operates by tracking the charged particles traversing through the detector material behind the phantom. The proton CT prototype will be used to determine the feasibility of using proton CT to increase the dose planning accuracy for particle treatment of cancer cells.
The detector is designed as a telescope of 43 layers of sensors, where the two front layers act as the position-sensitive detector providing an accurate vector of each incoming particle. The remaining layers are used to measure the residual energy of each particle by observing in which layer they stop and by using the cluster size in each layer.
The Digital Tracking Calorimeter employs the ALPIDE sensor, a monolithic active pixel sensor, each utilizing a 1.2Gb/s data link. Each layer of 18Ă—27 cm consists of 108 ALPIDE sensors, roughly corresponding to the width and height of the head of a grown person. The sensors are connected to intermediary transition boards that route the data and control links to dedicated readout electronics and supply the sensors with power. The readout unit is the main component of both the data acquisition and the detector control system. The power control unit controls the power supply and monitors the current usage of the sensors. Both of these devices are mainly implemented in FPGAs.
The main purpose of this work has been to explore and implement possible design solutions for the proton CT electronics, including the front-end, as well as the readout electronics architecture. The resulting architecture is modular, allowing the further scale-up of the system in the future. A major obstacle to the design is the high amount of sensors and the corresponding high-speed data links. Thus, a large emphasis has been on the signal integrity of the front-end electronics and a dynamic phase alignment sampling method of the readout electronics firmware. The readout FPGA employs regular I/O pins for the high-speed data interface, instead of high-speed transceiver pins, which significantly reduces the magnitude of the data acquisition system.
A consistent design approach with detailed and systematic verification of the FPGA firmware modules, along with a continuous integration build system, has resulted in a stable and highly adaptive system. Significant effort has been put into the testing of the various system components. This also includes the design and implementation of a set of production test tools for use during the manufacturing of the detector front-end.Doktorgradsavhandlin
- …