1,444 research outputs found
Layout regularity metric as a fast indicator of process variations
Integrated circuits design faces increasing challenge as we scale down due to the increase of the effect of sensitivity to process variations. Systematic variations induced by different steps in the lithography process affect both parametric and functional yields of the designs. These variations are known, themselves, to be affected by layout topologies. Design for Manufacturability (DFM) aims at defining techniques that mitigate variations and improve yield. Layout regularity is one of the trending techniques suggested by DFM to mitigate process variations effect. There are several solutions to create regular designs, like restricted design rules and regular fabrics. These regular solutions raised the need for a regularity metric. Metrics in literature are insufficient for different reasons; either because they are qualitative or computationally intensive. Furthermore, there is no study relating either lithography or electrical variations to layout regularity. In this work, layout regularity is studied in details and a new geometrical-based layout regularity metric is derived. This metric is verified against lithographic simulations and shows good correlation. Calculation of the metric takes only few minutes on 1mm x 1mm design, which is considered fast compared to the time taken by simulations. This makes it a good candidate for pre-processing the layout data and selecting certain areas of interest for lithographic simulations for faster throughput. The layout regularity metric is also compared against a model that measures electrical variations due to systematic lithographic variations. The validity of using the regularity metric to flag circuits that have high variability using the developed electrical variations model is shown. The regularity metric results compared to the electrical variability model results show matching percentage that can reach 80%, which means that this metric can be used as a fast indicator of designs more susceptible to lithography and hence electrical variations
Addressing Manufacturing Challenges in NoC-based ULSI Designs
Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing
How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one
Manufacturability Aware Design.
The aim of this work is to provide solutions that optimize the tradeoffs among design, manufacturability, and cost of ownership posed by technology scaling and sub-wavelength lithography. These solutions may take the form of robust circuit designs, cost-effective resolution technologies, accurate modeling considering process variations, and design rules assessment.
We first establish a framework for assessing the impact of process variation on circuit performance, product value and return on investment on alternative processes. Key features include comprehensive modeling and different handling on die-to-die and within-die variation, accurate models of correlations of variation, realistic and quantified projection to future process nodes, and performance sensitivity analysis to improved control of individual device parameter and variation sources.
Then we describe a novel minimum cost of correction methodology which determines the level of correction of each layout feature such that the prescribed parametric yield is attained with minimum RET (Resolution Enhancement Technology) cost. This timing driven OPC (Optical Proximity Correction) insertion flow uses a mathematical programming based slack budgeting algorithm to determine OPC level for all polysilicon gate geometries. Designs adopting this methodology show up to 20% MEBES (Manufacturing Electron Beam Exposure System) data volume reduction and 39% OPC runtime improvement.
When the systematic correction residual errors become unavoidable, we analyze their impact on a state-of-art microprocessor's speedpath skew. A platform is created for diagnosing and improving OPC quality on gates with specific functionality such as critical gates or matching transistors. Significant changes in full-chip timing analysis indicate the necessity of a post-OPC performance verification design flow.
Finally, we quantify the performance, manufacturability and mask cost impact of globally applying several common restrictive design rules. Novel approaches such as locally adapting FDRs (flexible design rules) based on image parameters range, and DRC Plus (preferred design rule enforcement with 2D pattern matching) are also described.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57676/2/jiey_1.pd
Power efficient, event driven data acquisition and processing using asynchronous techniques
PhD ThesisData acquisition systems used in remote environmental monitoring equipment and biological
sensor nodes rely on limited energy supply soured from either energy harvesters or battery to
perform their functions. Among the building blocks of these systems are power hungry Analogue
to Digital Converters and Digital Signal Processors which acquire and process samples
at predetermined rates regardless of the monitored signal’s behavior. In this work we investigate
power efficient event driven data acquisition and processing techniques by implementing
an asynchronous ADC and an event driven power gated Finite Impulse Response (FIR) filter.
We present an event driven single slope ADC capable of generating asynchronous digital samples
based on the input signal’s rate of change. It utilizes a rate of change detection circuit
known as the slope detector to determine at what point the input signal is to be sampled. After
a sample has been obtained it’s absolute voltage value is time encoded and passed on to a Time
to Digital Converter (TDC) as part of a pulse stream. The resulting digital samples generated
by the TDC are produced at a rate that exhibits the same rate of change profile as that of the
input signal. The ADC is realized in 0.35mm CMOS process, covers a silicon area of 340mm
by 218mm and consumes power based on the input signal’s frequency.
The samples from the ADC are asynchronous in nature and exhibit random time periods between
adjacent samples. In order to process such asynchronous samples we present a FIR filter that is
able to successfully operate on the samples and produce the desired result. The filter also poses
the ability to turn itself off in-between samples that have longer sample periods in effect saving
power in the process
Regular cell design approach considering lithography-induced process variations
The deployment delays for EUVL, forces IC design to continue using 193nm wavelength lithography with innovative and costly techniques in order to faithfully print sub-wavelength features and combat lithography induced process variations. The effect of the lithography gap in current and upcoming technologies is to cause severe distortions due to optical diffraction in the printed patterns and thus degrading manufacturing yield. Therefore, a paradigm shift in layout design is mandatory towards more regular litho-friendly cell designs in order to improve line pattern resolution. However, it is still unclear the amount of layout regularity that can be introduced and how to measure the benefits and weaknesses of regular layouts.
This dissertation is focused on searching the degree of layout regularity necessary to combat lithography variability and outperform the layout quality of a design. The main contributions that have been addressed to accomplish this objective are: (1) the definition of several layout design guidelines to mitigate lithography variability; (2) the proposal of a parametric yield estimation model to evaluate the lithography impact on layout design; (3) the development of a global Layout Quality Metric (LQM) including a Regularity Metric (RM) to capture the degree of layout regularity of a layout implementation and; (4) the creation of different layout architectures exploiting the benefits of layout regularity to outperform line-pattern resolution, referred as Adaptive Lithography Aware Regular Cell Designs (ALARCs).
The first part of this thesis provides several regular layout design guidelines derived from lithography simulations so that several important lithography related variation sources are minimized. Moreover, a design level methodology, referred as gate biasing, is proposed to overcome systematic layout dependent variations, across-field variations and the non-rectilinear gate effect (NRG) applied to regular fabrics by properly configuring the drawn transistor channel length.
The second part of this dissertation proposes a lithography yield estimation model to predict the amount of lithography distortion expected in a printed layout due to lithography hotspots with a reduced set of lithography simulations. An efficient lithography hotspot framework to identify the different layout pattern configurations, simplify them to ease the pattern analysis and classify them according to the lithography degradation predicted using lithography simulations is presented. The yield model is calibrated with delay measurements of a reduced set of identical test circuits implemented in a CMOS 40nm technology and thus actual silicon data is utilized to obtain a more realistic yield estimation.
The third part of this thesis presents a configurable Layout Quality Metric (LQM) that considering several layout aspects provides a global evaluation of a layout design with a single score. The LQM can be leveraged by assigning different weights to each evaluation metric or by modifying the parameters under analysis. The LQM is here configured following two different set of partial metrics. Note that the LQM provides a regularity metric (RM) in order to capture the degree of layout regularity applied in a layout design.
Lastly, this thesis presents different ALARC designs for a 40nm technology using different degrees of layout regularity and different area overheads. The quality of the gridded regular templates is demonstrated by automatically creating a library containing 266 cells including combinational and sequential cells and synthesizing several ITC'99 benchmark circuits. Note that the regular cell libraries only presents a 9\% area penalty compared to the 2D standard cell designs used for comparison and thus providing area competitive designs. The layout evaluation of benchmark circuits considering the LQM shows that regular layouts can outperform other 2D standard cell designs depending on the layout implementation.Los continuos retrasos en la implementación de la EUVL, fuerzan que el diseño de IC se realice mediante litografÃa de longitud de onda de 193 nm con innovadoras y costosas técnicas para poder combatir variaciones de proceso de litografÃa. La gran diferencia entre la longitud de onda y el tamaño de los patrones causa severas distorsiones debido a la difracción óptica en los patrones impresos y por lo tanto degradando el yield. En consecuencia, es necesario realizar un cambio en el diseño de layouts hacia diseños más regulares para poder mejorar la resolución de los patrones. Sin embargo, todavÃa no está claro el grado de regularidad que se debe introducir y como medir los beneficios y los perjuicios de los diseños regulares. El objetivo de esta tesis es buscar el grado de regularidad necesario para combatir las variaciones de litografÃa y mejorar la calidad del layout de un diseño. Las principales contribuciones para conseguirlo son: (1) la definición de diversas reglas de diseño de layout para mitigar las variaciones de litografÃa; (2) la propuesta de un modelo para estimar el yield paramétrico y asà evaluar el impacto de la litografÃa en el diseño de layout; (3) el diseño de una métrica para analizar la calidad de un layout (LQM) incluyendo una métrica para capturar el grado de regularidad de un diseño (RM) y; (4) la creación de diferentes tipos de layout explotando los beneficios de la regularidad, referidos como Adaptative Lithography Aware Regular Cell Designs (ALARCs). La primera parte de la tesis, propone las diversas reglas de diseño para layouts regulares derivadas de simulaciones de litografÃa de tal manera que las fuentes de variación de litografÃa son minimizadas. Además, se propone una metodologÃa de diseño para layouts regulares, referida como "gate biasing" para contrarrestar las variaciones sistemáticas dependientes del layout, las variaciones en la ventana de proceso del sistema litográfico y el efecto de puerta no rectilÃnea para configurar la longitud del canal del transistor correctamente. La segunda parte de la tesis, detalla el modelo de estimación del yield de litografÃa para predecir mediante un número reducido de simulaciones de litografÃa la cantidad de distorsión que se espera en un layout impreso debida a "hotspots". Se propone una eficiente metodologÃa que identifica los distintos patrones de un layout, los simplifica para facilitar el análisis de los patrones y los clasifica en relación a la degradación predecida mediante simulaciones de litografÃa. El modelo de yield se calibra utilizando medidas de tiempo de un número reducido de idénticos circuitos de test implementados en una tecnologÃa CMOS de 40nm y de esta manera, se utilizan datos de silicio para obtener una estimación realista del yield. La tercera parte de este trabajo, presenta una métrica para medir la calidad del layout (LQM), que considera diversos aspectos para dar una evaluación global de un diseño mediante un único valor. La LQM puede ajustarse mediante la asignación de diferentes pesos para cada métrica de evaluación o modificando los parámetros analizados. La LQM se configura mediante dos conjuntos de medidas diferentes. Además, ésta incluye una métrica de regularidad (RM) para capturar el grado de regularidad que se aplica en un diseño. Finalmente, esta disertación presenta los distintos diseños ALARC para una tecnologÃa de 40nm utilizando diversos grados de regularidad y diferentes impactos en área. La calidad de estos diseños se demuestra creando automáticamente una librerÃa de 266 celdas incluyendo celdas combinacionales y secuenciales y, sintetizando diversos circuitos ITC'99. Las librerÃas regulares solo presentan un 9% de impacto en área comparado con diseños de celdas estándar 2D y por tanto proponiendo diseños competitivos en área. La evaluación de los circuitos considerando la LQM muestra que los diseños regulares pueden mejorar otros diseños 2D dependiendo de la implementación del layout
POWER AND PERFORMANCE STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) ARCHITECTURE
Power and thermal constraints gained critical importance in the design of microprocessors over the past decade. Chipmakers failed to keep power at bay while sustaining the performance growth of serial computers at the rate expected by consumers. As an alternative, they turned to fitting an increasing number of simpler cores on a single die. While this is a step forward for relaxing the constraints, the issue of power is far from resolved and it is joined by new challenges which we explain next.
As we move into the era of many-cores, processors consisting of 100s, even 1000s of cores, single-task parallelism is the natural path for building faster general-purpose computers. Alas, the introduction of parallelism to the mainstream general-purpose domain brings another long elusive problem to focus: ease of parallel programming. The result is the dual challenge where power efficiency and ease-of-programming are vital for the prevalence of up and coming many-core architectures.
The observations above led to the lead goal of this dissertation: a first order validation of the claim that even under power/thermal constraints, ease-of-programming and competitive performance need not be conflicting objectives for a massively-parallel general-purpose processor. As our platform, we choose the eXplicit Multi-Threading (XMT) many-core architecture for fine grained parallel programs developed at the University of Maryland. We hope that our findings will be a trailblazer for future commercial products.
XMT scales up to thousand or more lightweight cores and aims at improving single task execution time while making the task for the programmer as easy as possible. Performance advantages and ease-of-programming of XMT have been shown in a number of publications, including a study that we present in this dissertation. Feasibility of the hardware concept has been exhibited via FPGA and ASIC (per our partial involvement) prototypes.
Our contributions target the study of power and thermal envelopes of an envisioned 1024-core XMT chip (XMT1024) under programs that exist in popular parallel benchmark suites. First, we compare XMT against an area and power equivalent commercial high-end many-core GPU. We demonstrate that XMT can provide an average speedup of 8.8x in irregular parallel programs that are common and important in general purpose computing. Even under the worst-case power estimation assumptions for XMT, average speedup is only reduced by half. We further this study by experimentally evaluating the performance advantages of Dynamic Thermal Management (DTM), when applied to XMT1024. DTM techniques are frequently used in current single and multi-core processors, however until now their effects on single-tasked many-cores have not been examined in detail. It is our purpose to explore how existing techniques can be tailored for XMT to improve performance. Performance improvements up to 46% over a generic global management technique has been demonstrated. The insights we provide can guide designers of other similar many-core architectures.
A significant infrastructure contribution of this dissertation is a highly configurable cycle-accurate simulator, XMTSim. To our knowledge, XMTSim is currently the only publicly-available shared-memory many-core simulator with extensive capabilities for estimating power and temperature, as well as evaluating dynamic power and thermal management algorithms. As a major component of the XMT programming toolchain, it is not only used as the infrastructure in this work but also contributed to other publications and dissertations
- …