3 research outputs found

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    Passively mode-locked semiconductor lasers for all-optical applications

    Get PDF
    The recent increase of internet traffic is creating demand for higher bandwidth in telecommunication networks. In order to satisfy this ever increasing demand for bandwidth, it is necessary to investigate new devices and technologies for all-optical signal processing that allow increasing the transmission data rate and the capacity for the current and future optical networks. Optical time division multiplexing (OTDM) is a widely deployed technique that allows increasing the bit rate and capacity of optical networks. In OTDM networks the regeneration and the demultiplexing of the data channels are two common and important functions normally carried out. However, they require a clock signal, which is usually implemented by optoelectronics components, making a system expensive, bulky and difficult to implement. In order to provide a solution to this issue, the focus of this thesis is to investigate all-optical clock recovery by using external injection locking of passively semiconductor mode-locked lasers. In particular, quantum-dash mode-locked laser diodes (QDash-MLLDs) are studied. These lasers can generate optical pulses with durations in the order of picoseconds and femtoseconds using only DC-bias with no need for external modulation. Besides, they are attractive due to their simplicity of operation, low power consumption, fast carrier dynamics and compactness. Furthermore, they provide a narrow radio frequency beating linewidth, resulting in a small amount of phase noise and low timing jitter. In this thesis, all-optical clock recovery of data signals at base bit rate (40 Gb/s) and high bit rates (up to 320 Gb/s) was achieved using QDash-MLLDs. The recovered clocks from the different data input signals considered in this thesis feature low values of timing jitter, which are compliant with the minimum requirements for practical applications. Furthermore, the recovered clocks at high speed are used to demultiplex signals to tributaries of 40 Gb/s, achieving error free performance. Finally, investigation of the QDash-MLLD dynamics demonstrated that the laser provides a very fast locking time (25 ns) when synchronised to data signals which enables it as a solution to optical burst/packet switched networks. All these results contribute to demonstrate that the laser is an extremely reliable, cost-effective and a green solution for all-optical signal processing

    From experiment to design – fault characterization and detection in parallel computer systems using computational accelerators

    Get PDF
    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads
    corecore