35,200 research outputs found

    High-Performance low-vcc in-order core

    Get PDF
    Power density grows in new technology nodes, thus requiring Vcc to scale especially in mobile platforms where energy is critical. This paper presents a novel approach to decrease Vcc while keeping operating frequency high. Our mechanism is referred to as immediate read after write (IRAW) avoidance. We propose an implementation of the mechanism for an Intel® SilverthorneTM in-order core. Furthermore, we show that our mechanism can be adapted dynamically to provide the highest performance and lowest energy-delay product (EDP) at each Vcc level. Results show that IRAW avoidance increases operating frequency by 57% at 500mV and 99% at 400mV with negligible area and power overhead (below 1%), which translates into large speedups (48% at 500mV and 90% at 400mV) and EDP reductions (0.61 EDP at 500mV and 0.33 at 400mV).Peer ReviewedPostprint (published version

    Penelope: The NBTI-aware processor

    Get PDF
    Transistors consist of lower number of atoms with every technology generation. Such atoms may be displaced due to the stress caused by high temperature, frequency and current, leading to failures. NBTI (negative bias temperature instability) is one of the most important sources of failure affecting transistors. NBTI degrades PMOS transistors whenever the voltage at the gate is negative (logic inputPeer ReviewedPostprint (published version

    Three-dimensional memory vectorization for high bandwidth media memory systems

    Get PDF
    Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth. We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file. By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.Peer ReviewedPostprint (published version

    Multi-port Memory Design for Advanced Computer Architectures

    Get PDF
    In this thesis, we describe and evaluate novel memory designs for multi-port on-chip and off-chip use in advanced computer architectures. We focus on combining multi-porting and evaluating the performance over a range of design parameters. Multi-porting is essential for caches and shared-data systems, especially multi-core System-on-chips (SOC). It can significantly increase the memory access throughput. We evaluate FinFET voltage-mode multi-port SRAM cells using different metrics including leakage current, static noise margin and read/write performance. Simulation results show that single-ended multi-port FinFET SRAMs with isolated read ports offer improved read stability and flexibility over classical double-ended structures at the expense of write performance. By increasing the size of the access transistors, we show that the single-ended multi-port structures can achieve equivalent write performance to the classical double-ended multi-port structure for 9% area overhead. Moreover, compared with CMOS SRAM, FinFET SRAM has better stability and standby power. We also describe new methods for the design of FinFET current-mode multi-port SRAM cells. Current-mode SRAMs avoid the full-swing of the bitline, reducing dynamic power and access time. However, that comes at the cost of voltage drop, which compromises stability. The design proposed in this thesis utilizes the feature of Independent Gate (IG) mode FinFET, which can leverage threshold voltage by controlling the back gate voltage, to merge two transistors into one through high-Vt and low-Vt transistors. This design not only reduces the voltage drop, but it also reduces the area in multi-port current-mode SRAM design. For off-chip memory, we propose a novel two-port 1-read, 1-write (1R1W) phasechange memory (PCM) cell, which significantly reduces the probability of blocking at the bank levels. Different from the traditional PCM cell, the access transistors are at the top and connected to the bitline. We use Verilog-A to model the behavior of Ge2Sb2Te5 (GST: the storage component). We evaluate the performance of the two-port cell by transistor sizing and voltage pumping. Simulation results show that pMOS transistor is more practical than nMOS transistor as the access device when both area and power are considered. The estimated area overhead is 1.7�, compared to single-port PCM cell. In brief, the contribution we make in this thesis is that we propose and evaluate three different kinds of multi-port memories that are favorable for advanced computer architectures

    Concept design of a fast sail assisted feeder container ship

    No full text
    An environmentally sustainable fast sail-assisted feeder-container ship concept, with a maximum speed of 25 knots, has been developed for the 2020 South East Asian and Caribbean container markets. The use of low-carbon and zero-sulphur fuel (liquefied natural gas) and improvements in operational efficiency (cargo handling and scheduling) mean predicted Green house gas emissions should fall by 42% and 40% in the two selected operational regions. The adoption of a Multi-wing sail system reduces power requirement by up to 6% at the lower ship speed of 15 knots. The predicted daily cost savings are respectively 27% and 33% in South East Asian and the Caribbean regions.Two hull forms with a cargo capacity of 1270TEU utilising different propulsion combinations were initially developed to meet operational requirements. Analysis & tank testing of different hydrodynamic phenomena has enabled identification of efficiency gains for each design. The final propulsion chosen is a contra-rotating podded drive arrangement. Wind tunnel testing improved Multi-wing sail performance by investigating wing spacing, wing stagger and sail-container interactions. The associated lift coefficient was increased by 32%. Whilst savings in sail-assisted power requirement are lower than initially predicted an unexpected identified benefit was motion damping.The fast feeder-container ship is a proposed as a viable future method of container transhipment

    Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture

    Get PDF
    This paper describes a low-power processor tailored for fast Fourier transform computations where transport triggering template is exploited. The processor is software-programmable while retaining an energy-efficiency comparable to existing fixed-function implementations. The power savings are achieved by compressing the computation kernel into one instruction word. The word is stored in an instruction loop buffer, which is more power-efficient than regular instruction memory storage. The processor supports all power-of-two FFT sizes from 64 to 16384 and given 1 mJ of energy, it can compute 20916 transforms of size 1024.Comment: 5 pages, 4 figures, 1 table, ICASSP 2019 conferenc

    Concept design of a fast sail assisted feeder container ship

    No full text
    A fast sail assisted feeder container ship concept has been developed for the 2020 container market in the South East Asian and Caribbean regions.The design presented has met the requirements of an initial economic study, with a cargo capacity of 1270 twenty-foot equivalent unit containers, meeting the predictions of container throughput derived from historical data. In determining suitable vessel dimensions, account has also been taken for port and berthing restrictions, and considering hydrodynamic performance. The vessel has been designed for a maximum speed of 25 knots, allowing it to meet the demand for trade whilst reducing the number of ships operating on the routes considered.The design development of the fast feeder concept has involved rigorous analyses in a number of areas to improve the robustness of the final design. Model testing has been key to the development of the concept, by increasing confidence in the final result. This is due to the fact that other analysis techniques are not always appropriate or accurate. Two hull forms have been developed to meet requirements whilst utilising different propulsor combinations. This has enabled evaluation of efficiency gains resulting from different hydrodynamic phenomena for each design. This includes an evaluation of the hydrodynamic performance when utilising the sail system. This has been done using a combination of model test results and data from regression analysis. The final propulsor chosen is a contra-rotating podded drive arrangement. Wind tunnel testing has been used to maximise the performance of a Multi-wing sail system by investigating the effects of wing spacing, stagger and sail-container interactions. This has led to an increase in lift coefficient of 32% from initial predictions. The savings in power requirement due to the sail system are lower than initially predicted. However, another benefit of their installation, motion damping, has been identified. Whilst this has not been fully investigated, additional fuel savings are possible as well as improved seakeeping performance.The design is shown to be environmentally sustainable when compared to existing vessels operating on the proposed routes. This is largely due to the use of low-carbon and zero-sulphur fuel (liquefied natural gas) and improvements in efficiency regarding operation. This especially relates to cargo handling and scheduling. Green house gas emissions have been predicted to fall by 42% and 40% in the two regions should the design be adopted. These savings are also due to the use of the Multi-wing sail system, which contributes to reductions in power requirement of up to 6% when the vessel operates at its lower speed of 15 knots. It is demonstrated that the fast feeder is also economically feasible, with predicted daily cost savings of 27% and 33% in the South East Asian and Caribbean regions respectively. Thus the fast feeder container ship concept is a viable solution for the future of container transhipment. <br/

    Late allocation and early release of physical registers

    Get PDF
    The register file is one of the critical components of current processors in terms of access time and power consumption. Among other things, the potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register renaming schemes, both register allocation and releasing are conservatively done, the former at the rename stage, before registers are loaded with values, and the latter at the commit stage of the instruction redefining the same register, once registers are not used any more. We introduce VP-LAER, a renaming scheme that allocates registers later and releases them earlier than conventional schemes. Specifically, physical registers are allocated at the end of the execution stage and released as soon as the processor realizes that there will be no further use of them. VP-LAER enhances register utilization, that is, the fraction of allocated registers having a value to be read in the future. Detailed cycle-level simulations show either a significant speedup for a given register file size or a reduction in the register file size for a given performance level, especially for floating-point codes, where the register file pressure is usually high.Peer ReviewedPostprint (published version
    corecore