20 research outputs found
Parametric Yield of VLSI Systems under Variability: Analysis and Design Solutions
Variability has become one of the vital challenges that the
designers of integrated circuits encounter. variability becomes
increasingly important. Imperfect manufacturing process manifest
itself as variations in the design parameters. These variations
and those in the operating environment of VLSI circuits result in
unexpected changes in the timing, power, and reliability of the
circuits. With scaling transistor dimensions, process and
environmental variations become significantly important in the
modern VLSI design. A smaller feature size means that the physical
characteristics of a device are more prone to these
unaccounted-for changes. To achieve a robust design, the random
and systematic fluctuations in the manufacturing process and the
variations in the environmental parameters should be analyzed and
the impact on the parametric yield should be addressed.
This thesis studies the challenges and comprises solutions for
designing robust VLSI systems in the presence of variations.
Initially, to get some insight into the system design under
variability, the parametric yield is examined for a small circuit.
Understanding the impact of variations on the yield at the circuit
level is vital to accurately estimate and optimize the yield at
the system granularity. Motivated by the observations and results,
found at the circuit level, statistical analyses are performed,
and solutions are proposed, at the system level of abstraction, to
reduce the impact of the variations and increase the parametric
yield.
At the circuit level, the impact of the supply and threshold
voltage variations on the parametric yield is discussed. Here, a
design centering methodology is proposed to maximize the
parametric yield and optimize the power-performance trade-off
under variations. In addition, the scaling trend in the yield loss
is studied. Also, some considerations for design centering in the
current and future CMOS technologies are explored.
The investigation, at the circuit level, suggests that the
operating temperature significantly affects the parametric yield.
In addition, the yield is very sensitive to the magnitude of the
variations in supply and threshold voltage. Therefore, the spatial
variations in process and environmental variations make it
necessary to analyze the yield at a higher granularity. Here,
temperature and voltage variations are mapped across the chip to
accurately estimate the yield loss at the system level.
At the system level, initially the impact of process-induced
temperature variations on the power grid design is analyzed. Also,
an efficient verification method is provided that ensures the
robustness of the power grid in the presence of variations. Then,
a statistical analysis of the timing yield is conducted, by taking
into account both the process and environmental variations. By
considering the statistical profile of the temperature and supply
voltage, the process variations are mapped to the delay variations
across a die. This ensures an accurate estimation of the timing
yield. In addition, a method is proposed to accurately estimate
the power yield considering process-induced temperature and supply
voltage variations. This helps check the robustness of the
circuits early in the design process.
Lastly, design solutions are presented to reduce the power
consumption and increase the timing yield under the variations. In
the first solution, a guideline for floorplaning optimization in
the presence of temperature variations is offered. Non-uniformity
in the thermal profiles of integrated circuits is an issue that
impacts the parametric yield and threatens chip reliability.
Therefore, the correlation between the total power consumption and
the temperature variations across a chip is examined. As a result,
floorplanning guidelines are proposed that uses the correlation to
efficiently optimize the chip's total power and takes into account
the thermal uniformity.
The second design solution provides an optimization methodology
for assigning the power supply pads across the chip for maximizing
the timing yield. A mixed-integer nonlinear programming (MINLP)
optimization problem, subject to voltage drop and current
constraint, is efficiently solved to find the optimum number and
location of the pads
Design Techniques for Energy-Quality Scalable Digital Systems
Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing âdynamicâ systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than âstaticâ solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff
Optimization Tools for ConvNets on the Edge
L'abstract Ăš presente nell'allegato / the abstract is in the attachmen
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from todayâs points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Computational Sprinting: Exceeding Sustainable Power in Thermally Constrained Systems
Although process technology trends predict that transistor sizes will continue to shrink for a few more generations, voltage scaling has stalled and thus future chips are projected to be increasingly more power hungry than previous generations. Particularly in mobile devices which are severely cooling constrained, it is estimated that the peak operation of a future chip could generate heat ten times faster than than the device can sustainably vent.
However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this dissertation proposes computational sprinting, in which a system greatly exceeds sustainable power margins (by up to 10Ă?) to provide up to a few seconds of high-performance computation when a user interacts with the device. Computational sprinting exploits the material property of thermal capacitance to temporarily store the excess heat generated when sprinting. After sprinting, the chip returns to sustainable power levels and dissipates the stored heat when the system is idle.
This dissertation: (i) broadly analyzes thermal, electrical, hardware, and software considerations to analyze the feasibility of engineering a system which can provide the responsiveness of a plat- form with 10Ă? higher sustainable power within today\u27s cooling constraints, (ii) leverages existing sources of thermal capacitance to demonstrate sprinting on a real system today, and (iii) identifies the energy-performance characteristics of sprinting operation to determine runtime sprint pacing policies
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
Miniaturized Transistors
What is the future of CMOS? Sustaining increased transistor densities along the path of Moore's Law has become increasingly challenging with limited power budgets, interconnect bandwidths, and fabrication capabilities. In the last decade alone, transistors have undergone significant design makeovers; from planar transistors of ten years ago, technological advancements have accelerated to today's FinFETs, which hardly resemble their bulky ancestors. FinFETs could potentially take us to the 5-nm node, but what comes after it? From gate-all-around devices to single electron transistors and two-dimensional semiconductors, a torrent of research is being carried out in order to design the next transistor generation, engineer the optimal materials, improve the fabrication technology, and properly model future devices. We invite insight from investigators and scientists in the field to showcase their work in this Special Issue with research papers, short communications, and review articles that focus on trends in micro- and nanotechnology from fundamental research to applications
Low power digital baseband core for wireless Micro-Neural-Interface using CMOS sub/near-threshold circuit
This thesis presents the work on designing and implementing a low power digital baseband core with custom-tailored protocol for wirelessly powered Micro-Neural-Interface (MNI) System-on-Chip (SoC) to be implanted within the skull to record cortical neural activities. The core, on the tag end of distributed sensors, is designed to control the operation of individual MNI and communicate and control MNI devices implanted across the brain using received downlink commands from external base station and store/dump targeted neural data uplink in an energy efficient manner. The application specific protocol defines three modes (Time Stamp Mode, Streaming Mode and Snippet Mode) to extract neural signals with on-chip signal conditioning and discrimination. In Time Stamp Mode, Streaming Mode and Snippet Mode, the core executes basic on-chip spike discrimination and compression, real-time monitoring and segment capturing of neural signals so single spike timing as well as inter-spike timing can be retrieved with high temporal and spatial resolution. To implement the core control logic using sub/near-threshold logic, a novel digital design methodology is proposed which considers INWE (Inverse-Narrow-Width-Effect), RSCE (Reverse-Short-Channel-Effect) and variation comprehensively to size the transistor width and length accordingly to achieve close-to-optimum digital circuits. Ultra-low-power cell library containing 67 cells including physical cells and decoupling capacitor cells using the optimum fingers is designed, laid-out, characterized, and abstracted. A robust on-chip sense-amp-less SRAM memory (8X32 size) for storing neural data is implemented using 8T topology and LVT fingers. The design is validated with silicon tapeout and measurement shows the digital baseband core works at 400mV and 1.28 MHz system clock with an average power consumption of 2.2 ÎŒW, resulting in highest reported communication power efficiency of 290Kbps/ÎŒW to date
Reliable Software for Unreliable Hardware - A Cross-Layer Approach
A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers
FDSOI Design using Automated Standard-Cell-Grained Body Biasing
With the introduction of FDSOI processes at competitive technology nodes, body biasing on an unprecedented scale was made possible. Body biasing influences one of the central transistor characteristics, the threshold voltage. By being able to heighten or lower threshold voltage by more than 100mV, the very physics of transistor switching can be manipulated at run time. Furthermore, as body biasing does not lead to different signal levels, it can be applied much more fine-grained than, e.g., DVFS. With the state of the art mainly focused on combinations of body biasing with DVFS, it has thus ignored granularities unfeasible for DVFS. This thesis fills this gap by proposing body bias domain partitioning techniques and for body bias domain partitionings thereby generated, algorithms that search for body bias assignments. Several different granularities ranging from entire cores to small groups of standard cells were examined using two principal approaches: Designer aided pre-partitioning based determination of body bias domains and a first-time, fully automatized, netlist based approach called domain candidate exploration. Both approaches operate along the lines of activation and timing of standard cell groups. These approaches were evaluated using the example of a Dynamically Reconfigurable Processor (DRP), a highly efficient category of reconfigurable architectures which consists of an array of processing elements and thus offers many opportunities for generalization towards many-core architectures. Finally, the proposed methods were validated by manufacturing a test-chip. Extensive simulation runs as well as the test-chip evaluation showed the validity of the proposed methods and indicated substantial improvements in energy efficiency compared to the state of the art. These improvements were accomplished by the fine-grained partitioning of the DRP design. This method allowed reducing dynamic power through supply voltage levels yielding higher clock frequencies using forward body biasing, while simultaneously reducing static power consumption in unused parts.Die EinfĂŒhrung von FDSOI Prozessen in gegenwĂ€rtigen ProzessgröĂen ermöglichte die Nutzung von Substratvorspannung in nie zuvor dagewesenem Umfang. Substratvorspannung beeinflusst unter anderem eine zentrale Eigenschaft von Transistoren, die Schwellspannung. Mittels Substratvorspannung kann diese um mehr als 100mV erhöht oder gesenkt werden, was es ermöglicht, die schiere Physik des Schaltvorgangs zu manipulieren. Da weiterhin hiervon der Signalpegel der digitalen Signale unberĂŒhrt bleibt, kann diese Technik auch in feineren GranularitĂ€ten angewendet werden, als z.B. Dynamische Spannungs- und Frequenz Anpassung (Engl. Dynamic Voltage and Frequency Scaling, Abk. DVFS). Da jedoch der Stand der Technik Substratvorspannung hauptsĂ€chlich in Kombinationen mit DVFS anwendet, werden feinere GranularitĂ€ten, welche fĂŒr DVFS nicht mehr wirtschaftlich realisierbar sind, nicht berĂŒcksichtigt. Die vorliegende Arbeit schlieĂt diese LĂŒcke, indem sie Partitionierungsalgorithmen zur Unterteilung eines Entwurfs in SubstratvorspannungsdomĂ€nen vorschlĂ€gt und fĂŒr diese hierdurch unterteilten DomĂ€nen entsprechende Substratvorspannungen berechnet. Hierzu wurden verschiedene GranularitĂ€ten berĂŒcksichtigt, von ganzen Prozessorkernen bis hin zu kleinen Gruppen von Standardzellen. Diese EntwĂŒrfe wurden dann mit zwei verschiedenen Herangehensweisen unterteilt: Chipdesigner unterstĂŒtzte, vorpartitionierungsbasierte Bestimmung von SubstratvorspannungsdomĂ€nen, sowie ein erstmals vollautomatisierter, Netzlisten basierter Ansatz, in dieser Arbeit DomĂ€nen Kandidaten Exploration genannt. Beide AnsĂ€tze funktionieren nach dem Prinzip der Aktivierung, d.h. zu welchem Zeitpunkt welcher Teil des Entwurfs aktiv ist, sowie der Signallaufzeit durch die entsprechenden Entwurfsteile. Diese AnsĂ€tze wurden anhand des Beispiels Dynamisch Rekonfigurierbarer Prozessoren (DRP) evaluiert. DRPs stellen eine Klasse hocheffizienter rekonfigurierbarer Architekturen dar, welche hauptsĂ€chlich aus einem Feld von Rechenelementen besteht und dadurch auch zahlreiche Möglichkeiten zur Verallgemeinerung hinsichtlich Many-Core Architekturen zulĂ€sst. SchlieĂlich wurden die vorgeschlagenen Methoden in einem Testchip validiert. Alle ermittelten Ergebnisse zeigen im Vergleich zum Stand der Technik drastische Verbesserungen der Energieeffizienz, welche durch die feingranulare Unterteilung in SubstratvorspannungsdomĂ€nen erzielt wurde. Hierdurch konnten durch die Anwendung von Substratvorspannung höhere Taktfrequenzen bei gleicher Versorgungsspannung erzielt werden, wĂ€hrend zeitgleich in zeitlich unkritischen oder ungenutzten Entwurfsteilen die statische Leistungsaufnahme minimiert wurde