85 research outputs found
Physical Design Methodologies for Low Power and Reliable 3D ICs
As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management.
Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area.
Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance.
Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.
The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future
Algorithms for Circuit Sizing in VLSI Design
One of the key problems in the physical design of computer chips, also known as integrated circuits, consists of choosing a physical layout for the logic gates and memory circuits (registers) on the chip. The layouts have a high influence on the power consumption and area of the chip and the delay of signal paths. A discrete set of predefined layouts for each logic function and register type with different physical properties is given by a library. One of the most influential characteristics of a circuit defined by the layout is its size. In this thesis we present new algorithms for the problem of choosing sizes for the circuits and its continuous relaxation, and evaluate these in theory and practice. A popular approach is based on Lagrangian relaxation and projected subgradient methods. We show that seemingly heuristic modifications that have been proposed for this approach can be theoretically justified by applying the well-known multiplicative weights algorithm. Subsequently, we propose a new model for the sizing problem as a min-max resource sharing problem. In our context, power consumption and signal delays are represented by resources that are distributed to customers. Under certain assumptions we obtain a polynomial time approximation for the continuous relaxation of the sizing problem that improves over the Lagrangian relaxation based approach. The new resource sharing algorithm has been implemented as part of the BonnTools software package which is developed at the Research Institute for Discrete Mathematics at the University of Bonn in cooperation with IBM. Our experiments on the ISPD 2013 benchmarks and state-of-the-art microprocessor designs provided by IBM illustrate that the new algorithm exhibits more stable convergence behavior compared to a Lagrangian relaxation based algorithm. Additionally, better timing and reduced power consumption was achieved on almost all instances. A subproblem of the new algorithm consists of finding sizes minimizing a weighted sum of power consumption and signal delays. We describe a method that approximates the continuous relaxation of this problem in polynomial time under certain assumptions. For the discrete problem we provide a fully polynomial approximation scheme under certain assumptions on the topology of the chip. Finally, we present a new algorithm for timing-driven optimization of registers. Their sizes and locations on a chip are usually determined during the clock network design phase, and remain mostly unchanged afterwards although the timing criticalities on which they were based can change. Our algorithm permutes register positions and sizes within so-called clusters without impairing the clock network such that it can be applied late in a design flow. Under mild assumptions, our algorithm finds an optimal solution which maximizes the worst cluster slack. It is implemented as part of the BonnTools and improves timing of registers on state-of-the-art microprocessor designs by up to 7.8% of design cycle time. </div
Timing Closure in Chip Design
Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips
Sincronização em sistemas integrados a alta velocidade
Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo
skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos
e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura.
Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono.
Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite
estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo.
Este modelo pode ser facilmente incorporado numa ferramenta autom atica
para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação.
Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo,
esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically
everywhere (low jitter) in high-performance Integrated Circuits (ICs)
has become an increasingly di cult and time-consuming task, due to technology
scaling. As transistor dimensions shrink and more functionality is
packed into an IC, clock precision becomes increasingly a ected by Process,
Voltage and Temperature (PVT) variations. This thesis addresses the
problem of clock uncertainty in high-performance ICs, in order to determine
the limits of the synchronous design paradigm.
In pursuit of this main goal, this thesis proposes four new uncertainty models,
with di erent underlying principles and scopes. The rst model targets
uncertainty in static CMOS inverters. The main advantage of this model
is that it depends only on parameters that can easily be obtained. Thus,
it can provide information on upcoming constraints very early in the design
stage. The second model addresses uncertainty in repeaters with RC interconnects,
allowing the designer to optimise the repeater's size and spacing,
for a given uncertainty budget, with low computational e ort. The third
model, can be used to predict jitter accumulation in cascaded repeaters, like
clock trees or delay lines. Because it takes into consideration correlations
among variability sources, it can also be useful to promote
oorplan-based
power and clock distribution design in order to minimise jitter accumulation.
A fourth model is proposed to analyse uncertainty in systems with multiple
synchronous domains. It can be easily incorporated in an automatic tool
to determine the best topology for a given application or to evaluate the
system's tolerance to power-supply noise.
Finally, using the proposed models, this thesis discusses clock precision
trends. Results show that limits in clock precision are ultimately imposed
by dynamic uncertainty, which is expected to continue increasing with technology
scaling. Therefore, it advocates the search for solutions at other
abstraction levels, and not only at the physical level, that may increase
system performance with a smaller impact on the assumptions behind the
synchronous design paradigm
A design flow for performance planning : new paradigms for iteration free synthesis
In conventional design, higher levels of synthesis produce a netlist, from which layout synthesis builds a mask specification for manufacturing. Timing anal ysis is built into a feedback loop to detect timing violations which are then used to update specifications to synthesis. Such iteration is undesirable, and for very high performance designs, infeasible. The problem is likely to become much worse with future generations of technology. To achieve a non-iterative design flow, early synthesis stages should use wire planning to distribute delays over the functional elements and interconnect, and layout synthesis should use its degrees of freedom to realize those delays
Recommended from our members
Direct sampling receivers for broadband communications
Today everything tends to be connected in the Internet of Things (IoT) universe, where a broad variety of communication standards and technologies are used for those connected devices. It is always a dream to design a Software-Defined Radio (SDR) supporting different standards solely based on the software configuration. As integrated-circuit (IC) manufacture and design advance, a partial of SDR can be realized. This thesis investigates one of the most important parts in a SDR: the analog design of a direct sampling (DS) receiver, which mainly consists of a broadband RF front end and a wideband ADC. Especially, a DS receiver shows a great flexibility and efficiency for the simultaneous reception of multiple channels comparing with the traditional parallelism of superheterodyne structure.
The research contributions of this work include (1) demonstration and comparative analysis of two new architectures of broadband RFPGAs: voltage-mode: RFPGA-V and current-mode: RFPGA-I. RFPGA-V and RFPGA-I utilize an innovative interpolation method and current steering approach, respectively, to achieve a fine gain step of 0.25-dB over 40-dB gain range for several GHz frequency range. Besides, with innovative design, no off-chip inductor is needed for the both RFPGAs. (2) The design of a 5-GS/s 10b time-interleaved SAR. The ADC power efficiency is significantly improved by many design techniques: the low-energy CDAC switching scheme, optimized input common-mode voltage for comparator, optimal reduced radix-2 capacitor ratio for low-power reference buffers and higher conversion speed, etc. The lane-to-lane mismatches in a time-interleave ADC are minimized by using optimal floor plan and then are calibrated digitally.
Three prototypes: the broadband RF front ends with RFPGA-V, the broadband RF front ends with RFPGA-I and a 5-GHz ADC, are fabricated to verify the proposed ideas in 28nm CMOS technology.Electrical and Computer Engineerin
Recommended from our members
Realization of Integrated Coherent LiDAR
LiDAR (Light Detection and Ranging) captures high-definition real-time 3D images of the surrounding environment through active sensing with infrared lasers. It has unique advantages that can compensate the fundamental limitations in camera-based 3D imaging via vision algorithms or RADARs, which makes it an important sensing modality to guarantee robust autonomy in self-driving cars. However, high price tag of existing commercial LiDAR modules based on mechanical beam scanners and intensity-based detection scheme makes them unusable in the context of mass produced consumer products.The focus of thesis is on the integrated coherent LiDAR with optical phased array-based solid-state beam steering, which has great potential to dramatically bring down the cost of a LiDAR module. It begins with an overview of LiDAR implementation options and system requirements in the context of autonomous vehicles, which leads us to conclude that beam-steering coherent FMCW LiDAR in optical C-band is indeed the best implementation strategy to realize low-cost automotive LiDARs. Motivated by this observation, a quantitative framework for evaluating FMCW LiDAR performance is also introduced to predict the design that satisfies car-grade performance requirements. Then the thesis presents the silicon implementation results from our single-chip optical phased array and integrated coherent LiDAR prototype. Our implementations leverage the 3D heterogeneous integration platform, where custom silicon photonics and nanoscale CMOS fabricated at a 300 mm wafer facility are combined at the wafer-scale to minimize the unit cost without I/O density issues. After discussing remaining challenges and possible ways to enhance the operating range and system reliability, this thesis finally addresses the problem of fundamental trade-off between phase noise and wavelength tuning in FMCW laser source, and present circuit- and algorithm-level techniques to enable FMCW measurements beyond inherent laser coherence range limit
- …