4,589 research outputs found

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    Semiconductor optical amplifiers: performance and applications in optical packet switching [Invited]

    Get PDF
    Semiconductor optical amplifiers (SOAs) are a versatile core technology and the basis for the implementation of a number of key functionalities central to the evolution of highly wavelength-agile all-optical networks. We present an overview of the state of the art of SOAs and summarize a range of applications such as power boosters, preamplifiers, optical linear (gain-clamped) amplifiers, optical gates, and modules based on the hybrid integration of SOAs to yield high-level functionalities such as all-optical wavelength converters/regenerators and small space switching matrices. Their use in a number of proposed optical packet switching situations is also highlighted

    Design and Analysis of an Asynchronous Microcontroller

    Get PDF
    This dissertation presents the design of the most complex MTNCL circuit to date. A fully functional MTNCL MSP430 microcontroller is designed and benchmarked against an open source synchronous MSP430. The designs are compared in terms of area, active energy, and leakage energy. Techniques to reduce MTNCL pipeline activity and improve MTNCL register file area and power consumption are introduced. The results show the MTNCL design to have superior leakage power characteristics. The area and active energy comparisons highlight the need for better MTNCL logic synthesis techniques

    Course grained low power design flow using UPF

    Get PDF
    Increased system complexity has led to the substitution of the traditional bottom-up design flow by systematic hierarchical design flow. The main motivation behind the evolution of such an approach is the increasing difficulty in hardware realization of complex systems. With decreasing channel lengths, few key problems such as timing closure, design sign-off, routing complexity, signal integrity, and power dissipation arise in the design flows. Specifically, minimizing power dissipation is critical in several high-end processors. In high-end processors, the design complexity contributes to the overall dynamic power while the decreasing transistor size results in static power dissipation. This research aims at optimizing the design flow for power and timing using the unified power format (UPF). UPF provides a strategic format to specify power-aware design information at every stage in the flow. The low power reduction techniques enforced in this research are multi-voltage, multi-threshold voltage (Vth), and power gating with state retention. An inherent design challenge addressed in this research is the choice of power optimization techniques as the flow advances from synthesis to physical design. A top-down digital design flow for a 32 bit MIPS RISC processor has been implemented with and without UPF synthesis flow for 65nm technology. The UPF synthesis is implemented with two voltages, 1.08V and 0.864V (Multi-VDD). Area, power and timing metrics are analyzed for the flows developed. Power savings of about 20 % are achieved in the design flow with \u27multi-threshold\u27 power technique compared to that of the design flow with no low power techniques employed. Similarly, 30 % power savings are achieved in the design flow with the UPF implemented when compared to that of the design flow with \u27multi-threshold\u27 power technique employed. Thus, a cumulative power savings of 42% has been achieved in a complete power efficient design flow (UPF) compared to that of the generic top-down standard flow with no power saving techniques employed. This is substantiated by the low voltage operation of modules in the design, reduction in clock switching power by gating clocks in the design and extensive use of HVT and LVT standard cells for implementation. The UPF synthesis flow saw the worst timing slack and more area when compared to those of the `multi-threshold\u27 or the generic flow. Percentage increase in the area with UPF is approximately 15%; a significant source for this increase being the additional power controlling logic added

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Optical code-division multiple access system and optical signal processing

    Get PDF
    This thesis presents our recent researches on the development of coding devices, the investigation of security and the design of systems in the optical cod-division multiple access (OCDMA) systems. Besides, the techniques of nonlinear signal processing used in the OCDMA systems fire our imagination, thus some researches on all-optical signal processing are carried out and also summarized in this thesis. Two fiber Bragg grating (FBG) based coding devices are proposed. The first coding device is a superstructured FBG (SSFBG) using ±π/2-phase shifts instead of conventional 0/π-phase shifts. The ±π/2-phase-shifted SSFBG en/decoders can not only conceal optical codes well in the encoded signals but also realize the reutilization of available codes by hybrid use with conventional 0/π-phase-shifted SSFBG en/decoders. The second FBG based coding device is synthesized by layer-peeling method, which can be used for simultaneous optical code recognition and chromatic dispersion compensation. Then, two eavesdropping schemes, one-bit delay interference detection and differential detection, are demonstrated to reveal the security vulnerability of differential phase-shift keying (DPSK) and code-shift keying (CSK) OCDMA systems. To address the security issue as well as increase the transmission capacity, an orthogonal modulation format based on DPSK and CSK is introduced into the OCDMA systems. A 2 bit/symbol 10 Gsymbol/s transmission system using the orthogonal modulation format is achieved. The security of the system can be partially guaranteed. Furthermore, a fully-asynchronous gigabit-symmetric OCDMA passive optical network (PON) is proposed, in which a self-clocked time gate is employed for signal regeneration. A remodulation scheme is used in the PON, which let downstream and upstream share the same optical carrier, allowing optical network units source-free. An error-free 4-user 10 Gbit/s/user duplex transmission over 50 km distance is reazlied. A versatile waveform generation scheme is then studied. A theoretical model is established and a waveform prediction algorithm is summarized. In the demonstration, various waveforms are generated including short pulse, trapezoidal, triangular and sawtooth waveforms and doublet pulse. ii In addition, an all-optical simultaneous half-addition and half-subtraction scheme is achieved at an operating rate of 10 GHz by using only two semiconductor optical amplifiers (SOA) without any assist light. Lastly, two modulation format conversion schemes are demonstrated. The first conversion is from NRZ-OOK to PSK-Manchester coding format using a SOA based Mach-Zehnder interferometer. The second conversion is from RZ-DQPSK to RZ-OOK by employing a supercontinuum based optical thresholder

    Design and Analysis of an Adaptive Asynchronous System Architecture for Energy Efficiency

    Get PDF
    Power has become a critical design parameter for digital CMOS integrated circuits. With performance still garnering much concern, a central idea has emerged: minimizing power consumption while maintaining performance. The use of dynamic voltage scaling (DVS) with parallelism has shown to be an effective way of saving power while maintaining performance. However, the potency of DVS and parallelism in traditional, clocked synchronous systems is limited because of the strict timing requirements such systems must comply with. Delay-insensitive (DI) asynchronous systems have the potential to benefit more from these techniques due to their flexible timing requirements and high modularity. This dissertation presents the design and analysis of a real-time adaptive DVS architecture for paralleled Multi-Threshold NULL Convention Logic (MTNCL) systems. Results show that energy-efficient systems with low area overhead can be created using this approach

    Redundant Skewed Clocking of Pulse-Clocked Latches for Low Power Soft-Error Mitigation

    Get PDF
    abstract: An integrated methodology combining redundant clock tree synthesis and pulse clocked latches mitigates both single event upsets (SEU) and single event transients (SET) with reduced power consumption. This methodology helps to change the hardness of the design on the fly. This approach, with minimal additional overhead circuitry, has the ability to work in three different modes of operation depending on the speed, hardness and power consumption required by design. This was designed on 90nm low-standby power (LSP) process and utilized commercial CAD tools for testing. Spatial separation of critical nodes in the physical design of this approach mitigates multi-node charge collection (MNCC) upsets. An advanced encryption system implemented with the proposed design, compared to a previous design with non-redundant clock trees and local delay generation. The proposed approach reduces energy per operation up to 18% over an improved version of the prior approach, with negligible area impact. It can save up to 2/3rd of the power consumption and reach maximum possible frequency, when used in non-redundant mode of operation.Dissertation/ThesisMasters Thesis Electrical Engineering 201
    corecore