11 research outputs found

    Interfacing synchronous and asynchronous modules within a high-speed pipeline

    Get PDF
    Journal ArticleAbstract-This paper describes a new technique for integrating asynchronous modules within a high-speed synchronous pipeline. Our design eliminates potential metastability problems by using a clock generated by a stoppable ring oscillator, which is capable of driving the large clock load found in present day microprocessors. Using the ATACS design tool, we designed highly optimized transistor-level circuits to control the ring oscillator and generate the clock and handshake signals with minimal overhead. Our interface architecture requires no redesign of the synchronous circuitry. Incorporating asynchronous modules in a high-speed pipeline improves performance by exploiting data-dependent delay variations. Since the speed of the synchronous circuitry tracks the speed of the ring oscillator under different processes, temperatures, and voltages, the entire chip operates at the speed dictated by the current operating conditions, rather than being governed by the worst case conditions. These two factors together can lead to a significant improvement in average-case performance. The interface design is simulated using the 0.6- m HP CMOS14B process in HSPICE

    Using MCD-DVS for dynamic thermal management performance improvement

    Get PDF
    With chip temperature being a major hurdle in microprocessor design, techniques to recover the performance loss due to thermal emergency mechanisms are crucial in order to sustain performance growth. Many techniques for power reduction in the past and some on thermal management more recently have contributed to alleviate this problem. Probably the most important thermal control technique is dynamic voltage and frequency scaling (DVS) which allows for almost cubic reduction in power with worst-case performance penalty only linear. So far, DVS techniques for temperature control have been studied at the chip level. Finer grain DVS is feasible if a globally-asynchronous locally-synchronous (GALS) design style is employed. GALS, also known as multiple-clock domain (MCD), allows for an independent voltage and frequency control for each one of the clock domains that are part of the chip. There are several studies on DVS for GALS that aim to improve energy and power efficiency but not temperature. This paper proposes and analyses the usage of DVS at the domain level to control temperature in a clustered MCD microarchitecture with the goal of improving the performance of applications that do not meet the thermal constraints imposed by the designers.Peer ReviewedPostprint (published version

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Power efficient resource scaling in partitioned architectures through dynamic heterogeneity

    Get PDF
    Journal ArticleThe ever increasing demand for high clock speeds and the desire to exploit abundant transistor budgets have resulted in alarming increases in processor power dissipation. Partitioned (or clustered) architectures have been proposed in recent years to address scalability concerns in future billion-transistor microprocessors. Our analysis shows that increasing processor resources in a clustered architecture results in a linear increase in power consumption, while providing diminishing improvements in single-thread performance. To preserve high performance to power ratios, we claim that the power consumption of additional resources should be in proportion to the performance improvements they yield. Hence, in this paper, we propose the implementation of heterogeneous clusters that have varying delay and power characteristics. A cluster's performance and power characteristic is tuned by scaling its frequency and novel policies dynamically assign frequencies to clusters, while attempting to either meet a fixed power budget or minimize a metric such as EnergyĂ—Delay2 (ED2). By increasing resources in a power-efficient manner, we observe a 11% improvement in ED2 and a 22.4% average reduction in peak temperature, when compared to a processor with homogeneous units. Our proposed processor model also provides strategies to handle thermal emergencies that have a relatively low impact on performance

    Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling

    Get PDF
    Journal ArticleAs clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor in which the chip is divided into several (coarse-grained) clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end (including LI instruction cache), integer units, floating point units, and load-store units (including Ll data cache and L2 cache). We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use of-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Dynamic runs incorporate a detailed model of inter-domain synchronization delays, with latencies for intra-domain scaling similar to the whole-chip scaling latencies of Intel XScale and Transmeta LongRun technologies. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system

    Profile-based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor

    Get PDF
    A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently on the application’s critical path. Given a reconfiguration mechanism capable of choosing appropriate times and values for voltage/frequency scaling, an MCD processor has the potential to achieve significant energy savings with low performance degradation. Early work on MCD processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications, or by employing an oracle driven by off-line analysis of (identical) prior program runs. Subsequent work developed a hardware-based on-line mechanism that averages 75–85% of the energy-delay improvement achieved via off-line analysis. In this paper we consider the automatic insertion of reconfiguration instructions into applications, using profiledriven binary rewriting. Profile-based reconfiguration introduces the need for “training runs” prior to production use of a given application, but avoids the hardware complexity of on-line reconfiguration. It also has the potential to yield significantly greater energy savings. Experimental results (training on small data sets and then running on larger, alternative data sets) indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis

    Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture

    Get PDF
    We describe the design, analysis, and performance of an on-line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MCD microarchitecture allows the frequency/voltage of micrprocessor regions to be adjusted independently and dynamically, allowing enery savings when the frequency of some regions can be reduced without significantly impacting performance. Our algorithm achieves on average a 19.0% reduction in Energy Per Instriction (EPI), a 3.2% increase in Cycles Per Instruction (CPI), a 16.7% improvement in Energy–Delay Product, and a Power Savings to Performance Degradation ratio of 4.6. Traditional frequency/voltage scaling techniques which apply reductions globally to a fully synchronous processor achieve a Power Savings to Performance Degradation ratio of only 2–3. Our Energy–Delay Product improvement is 85.5% of what has been achieved using an off–line algorithm. These results were achieved using a broad range of applications from the MediaBench, Olden, and Spec2000 benchmark suites using an algorithm we show to require minimal hardware resources

    Alarm Forecasting in Natural Gas Pipelines

    Get PDF
    This thesis examines alarm forecasting methods for a natural gas production pipeline to assure the efficient transportation of high-quality natural gas. Natural gas production companies use pipelines to transport natural gas from the extraction well to a distribution point. Forecasting natural gas pipeline pressure alarms helps control room operators maintain a functioning pipeline and avoid costly down time. As gas enters the pipeline and travels to the distribution point, it is expected that the gas meets certain specifications set in place by either state law or the customer receiving the gas. If the gas meets these standards and is accepted at the distribution point, the pipeline is referred to as being in a steady-state. If the gas does not meet these standards, the production company runs the risk of being shut-in, or being unable to flow any more gas through the distribution point until the poor-quality gas is removed.Sensors are used to collect real-time gas quality information from within the pipe, and alarms are used to alert the control operators when a threshold is exceeded. If operators fail to keep the pipeline’s gas quality within an acceptable range, the company risks being shut¬¬-in or rupturing the pipeline. Predicting gas quality alarms enables operators to act earlier to avoid being shut-in and is a form of predictive maintenance. We forecast alarms by using a 10th-order autoregressive model, autoregressive model with exogenous variable, simple exponential smoothing with drift (Theta Method) and an artificial neural network with alarm thresholds. The alarm thresholds are defined by the production company and are occasionally adjusted to meet current environment conditions. The results of the alarm forecasting method show that we accurately forecast natural gas pipeline alarms up to a 30-minute time horizon. This translates into sensitivity rates that drop from around 100% at one minute to 82.7% at a 30-minute forecast horizon. This means that at 30 minutes, we correctly forecast 82.7% of the alarms. All alarm forecasting models outperform the state-or-the-art forecaster used by the production company, with the artificial neural network performing the best

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Interfacing Synchronous and Asynchronous Modules Within a High-Speed Pipeline

    Get PDF
    This paper describes a new technique for integrating asynchronous modules within a high-speed synchronous pipeline. Our design eliminates potential metastability problems by using a clock generated by a stoppable ring oscillator, which is capable of driving the large clock load found in present day microprocessors. Using the ATACS design tool, we designed highly optimized transistor-level circuits to control the ring oscillator and generate the clock and handshake signals with minimal overhead. Our interface architecture requires no redesign of the synchronous circuitry. Incorporating asynchronous modules in a highspeed pipeline improves performance by exploiting datadependent delay variations. Since the speed of the synchronous circuitry tracks the speed of the ring oscillator under different processes, temperatures, and voltages, the entire chip operates at the speed dictated by the current operating conditions, rather than being governed by the worst-case conditions. These two fact..
    corecore