128 research outputs found
Power Reductions with Energy Recovery Using Resonant Topologies
The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS).
As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times.
Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR
Circuit and System Level Design Optimization for Power Delivery And Management
As the VLSI technology scales to the nanometer scale, power consumption has become a critical design concern of VLSI circuits. Power gating and dynamic voltage and frequency scaling (DVFS) are two effective power management techniques that are widely utilized in modern chip designs. Various design challenges merge with these power management techniques in nanometer VLSI circuits. For example, power gating introduces unique power integrity issues and trade-offs between switching noise and rush current noise. Assuring power integrity and achieving power efficiency are two highly intertwined design challenges. In addition, these trade-offs significantly vary with the supply voltage. It is difficult to use conventional power-gated power delivery networks (PDNs) to fully meet the involved conflicting design constraints while maximizing power saving and minimizing supply noise. The DVFS controller and the DC-DC power converter are two highly intertwining enablers for DVFS-based systems. However, traditional DVFS techniques treat the design optimizations of the two as separate tasks, giving rise to sub-optimal designs.
To address the above research challenges, we propose several circuit and system level design optimization techniques in this dissertation. For power-gated PDN designs, we propose systemic decoupling capacitor (decap) optimization strategies that optimally trade-off between power integrity and leakage saving. First, new global decap and re-routable decap design concepts are proposed to relax the tight interaction between power integrity and leakage power saving of power-gated PDN at a single supply voltage level. Furthermore, we propose to leverage re-routable decaps to provide flexible decap allocation structures to better suit multiple supply voltage levels. The proposed strategies are implemented in an automatic design flow for choosing optimal amount of local decaps, global decaps and re-routable decaps. The proposed techniques significantly increase leakage saving without jeopardizing power integrity. The flexible decap allocations enabled by re-routable decaps lead to optimal design trade-offs for PDNs operating with two supply voltage levels.
To improve the effectiveness of DVFS, we analyze the drawbacks of circuit-level only and policy-level only optimizations and the promising opportunities resulted from the cross-layer co-optimization of the DC-DC converter and online learning based DVFS polices. We present a cross-layer approach that optimizes transition time, area, energy overhead of the DC-DC converter along with key parameters of an online learning DVFS controller. We systematically evaluate the benefits of the proposed co-optimization strategy based on several processor architectures, namely single and dual-core processors and processors with DVFS and power gating. Our results indicate that the co-optimization can introduce noticeable additional energy saving without significant performance degradation
Toward realizing power scalable and energy proportional high-speed wireline links
Growing computational demand and proliferation of cloud computing has placed high-speed
serial links at the center stage. Due to saturating energy efficiency improvements over the
last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as
output drivers, receiver, or clock generation and distribution. However, this approach yields
very limited efficiency improvement. This dissertation takes an alternative approach toward
reducing the serial link power. Instead of optimizing the power of individual building blocks,
power of the entire serial link is reduced by exploiting serial link usage by the applications.
It has been demonstrated that serial links in servers are underutilized. On average, they
are used only 15% of the time, i.e. these links are idle for approximately 85% of the time.
Conventional links consume power during idle periods to maintain synchronization between
the transmitter and the receiver. However, by powering-off the link when idle and powering
it back when needed, power consumption of the serial link can be scaled proportionally to
its utilization. This approach of rapid power state transitioning is known as the rapid-on/off
approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power,
and power state transition energy must all be close to zero. However, in practice, it is very
difficult to achieve these ideal conditions. Work presented in this dissertation addresses these
challenges.
When this research work was started (2011-12), there were only a couple of research papers
available in the area of rapid-on/off links. Systematic study or design of a rapid power state
transitioning in serial links was not available in the literature. Since rapid-on/off with
nanoseconds granularity is not a standard in any wireline communication, even the popular
test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However,
these challenges provided a unique opportunity to explore new architectural techniques and
identify trade-offs. The key contributions of this dissertation are as follows.
The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to
find alternative ways to reduce the serial link power.
The second contribution is to identify potential power saving techniques and evaluate the
challenges they pose and the opportunities they present.
The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature.
The transmitter achieves rapid-on/off capability in voltage mode output driver by using
a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and
periodic reference insertion. To ease timing requirements, an improved edge replacement
logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time
as a function of various circuit parameters is also discussed. The proposed transmitter
demonstrates energy proportional operation over wide variations of link utilization, and is,
therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the
voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns,
respectively. This dissertation highlights key trade-off in the clock multiplier architecture,
to achieve fast power-on-lock capability at the cost of jitter performance.
The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi-
plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita-
tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves
power-on-lock in 1ns.
The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded
clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit-
ter and receiver. It was the first reported design of a complete transceiver, with an embedded
clock architecture, having rapid-on/off capability. Background phase calibration technique in
PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on.
The proposed transceiver demonstrates power scalability with a wide range of link utiliza-
tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver
achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by
only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes
by 100x (7Gb/s-to-70Mb/s).
The sixth and final contribution is the design of a temperature sensor to compensate
the frequency drifts due to temperature variations, during long power-off periods, in the
fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor
is designed with all digital logic gates and achieves low supply sensitivity. This sensor is
suitable for integration in processor and DRAM environments. The proposed sensor works
on the principle of directly converting temperature information to frequency and finally
to digital bits. A novel sensing technique is proposed in which temperature information
is acquired by creating a threshold voltage difference between the transistors used in the
oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and
the overhead of voltage regulators and an external ideal reference frequency is avoided. The
effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated
in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V
to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC
and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity
correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 ,
and measurement (conversion) time of 6.5μs
High-Speed and Low-Energy On-Chip Communication Circuits.
Continuous technology scaling sharply reduces transistor delays, while fixed-length global wire delays have increased due to less wiring pitch with higher resistance and coupling capacitance. Due to this ever growing gap, long on-chip interconnects pose well-known latency, bandwidth, and energy challenges to high-performance VLSI systems. Repeaters effectively mitigate wire RC effects but do little to improve their energy costs. Moreover, the increased complexity and high level of integration requires higher wire densities, worsening crosstalk noise and power consumption of conventionally repeated interconnects.
Such increasing concerns in global on-chip wires motivate circuits to improve wire performance and energy while reducing the number of repeaters. This work presents circuit techniques and investigation for high-performance and energy-efficient on-chip communication in the aspects of encoding, data compression, self-timed current injection, signal pre-emphasis, low-swing signaling, and technology mapping. The improved bus designs also consider the constraints of robust operation and performance/energy gains across process corners and design space. Measurement results from 5mm links on 65nm and 90nm prototype chips validate 2.5-3X improvement in energy-delay product.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/75800/1/jseo_1.pd
Digitally Assisted ADCS.
This work involves the development of digital calibration techniques for Analogto-
Digital Converters. According to the 2001 International Technology Roadmap for
Semiconductors, improved ADC technology is a key factor in the development of present
and future applications.
The switched-capacitor (SC) pipeline technique is the most popular method of
implementing moderate resolution ADCs. However the advantages of CMOS, which
originally made SC circuits feasible, are being eroding by process scaling. Good switches
and opamps are becoming increasingly difficult to design and the growing gate leakage
of deep submicron MOSFETs is causing difficulty. Traditional ADC schemes do not
work well with supply voltages of 1.8V and below. Furthermore, the performance required by present and future wireless and IT applications will not be met by the present
day ADC circuits techniques.
Bearing in mind the challenges associated with deep sub-micron analog circuitry
a new calibration technique for folding ADCs has been developed. Since digital circuitry
scales well, this calibration relies heavily on digital techniques. Hence it reduces the
amount of analog design involved. As this folding ADC is dominated, in terms of both
functionality and power, by digital circuitry, the performance of folding will improve
when implemented in smaller geometry processes.
An 8-bit, 500MS/s, digitally calibrated folding ADC was designed in TSMC
0.18mm. A second prototype, 9-bit 400MS/s, was designed in ST 90nm. This ADC uses
novel folders to reduce thermal noise.
The major accomplishments of this work are:
· The creation of a new folding ADC architecture that is digitally dominated
allowing large transistor mismatch to be tolerated so that small devices
can be utilized in the signal path.
· The development of modeling techniques, to investigate and analyze the
effects of transistor mismatch, folder linearity and redundancy in ADCs.
· The design of a new folder circuit topology that decreases the required
power consumption for a given noise budget.
· The design of a resistor ladder DAC that uses a unique resistor layout to
allow any shape ladder to be designed.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58426/1/ibogue_1.pd
Power delivery mechanisms for asynchronous loads in energy harvesting systems
PhD ThesisFor systems depending on methods, a fundamental
contradiction in the power delivery chain has existed between conventional
to supply it. DC/DC conversion (e.g.)
has therefore been an integral part of such systems to resolve this contradiction.
be made tolerant to a much wider range of Vdd variance. This may open up
opportunities for much more energy efficient methods of power delivery.
performance of different power delivery mechanisms driving both asynchronous
and synchronous loads directly from a harvester source bypassing bulky energy
method, which employs a
energy from a EH circuit depending on load and source conditions, is developed.
through comprehensive comparative analysis.
Based on the novel CBB power delivery method, an asynchronous controller is
circuits to work with tasks. The successful asynchronous control design drives a
case study that is meant to explore relations between power path and task path.
To deal with different tasks with variable harvested power, systems may have a
range of operation conditions and thus dynamically call for CBB or SCC type power
set of capacitors to form CBB or SCC is implemented with economic system size.
This work presents an unconventional way of designing a compact-size, quick-
circuit
overcome large voltage variation in EH systems and implement smart power
management for harsh EH environment. The power delivery mechanisms (SCC,
employed to help asynchronous-
logic-based chip testing and micro-scale EH system demonstrations
Sincronização em sistemas integrados a alta velocidade
Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo
skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos
e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura.
Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono.
Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite
estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo.
Este modelo pode ser facilmente incorporado numa ferramenta autom atica
para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação.
Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo,
esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically
everywhere (low jitter) in high-performance Integrated Circuits (ICs)
has become an increasingly di cult and time-consuming task, due to technology
scaling. As transistor dimensions shrink and more functionality is
packed into an IC, clock precision becomes increasingly a ected by Process,
Voltage and Temperature (PVT) variations. This thesis addresses the
problem of clock uncertainty in high-performance ICs, in order to determine
the limits of the synchronous design paradigm.
In pursuit of this main goal, this thesis proposes four new uncertainty models,
with di erent underlying principles and scopes. The rst model targets
uncertainty in static CMOS inverters. The main advantage of this model
is that it depends only on parameters that can easily be obtained. Thus,
it can provide information on upcoming constraints very early in the design
stage. The second model addresses uncertainty in repeaters with RC interconnects,
allowing the designer to optimise the repeater's size and spacing,
for a given uncertainty budget, with low computational e ort. The third
model, can be used to predict jitter accumulation in cascaded repeaters, like
clock trees or delay lines. Because it takes into consideration correlations
among variability sources, it can also be useful to promote
oorplan-based
power and clock distribution design in order to minimise jitter accumulation.
A fourth model is proposed to analyse uncertainty in systems with multiple
synchronous domains. It can be easily incorporated in an automatic tool
to determine the best topology for a given application or to evaluate the
system's tolerance to power-supply noise.
Finally, using the proposed models, this thesis discusses clock precision
trends. Results show that limits in clock precision are ultimately imposed
by dynamic uncertainty, which is expected to continue increasing with technology
scaling. Therefore, it advocates the search for solutions at other
abstraction levels, and not only at the physical level, that may increase
system performance with a smaller impact on the assumptions behind the
synchronous design paradigm
- …