878 research outputs found
Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic
For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies.
Contributions of this work:
• Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation
• Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components
• Detailed performance characterization of CYA
– Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS)
– Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio
High-performance and Low-power Clock Network Synthesis in the Presence of Variation.
Semiconductor technology scaling requires continuous evolution of all aspects of physical
design of integrated circuits. Among the major design steps, clock-network synthesis
has been greatly affected by technology scaling, rendering existing methodologies inadequate.
Clock routing was previously sufficient for smaller ICs, but design difficulty and
structural complexity have greatly increased as interconnect delay and clock frequency increased
in the 1990s. Since a clock network directly influences IC performance and often
consumes a substantial portion of total power, both academia and industry developed synthesis
methodologies to achieve low skew, low power and robustness from PVT variations.
Nevertheless, clock network synthesis under tight constraints is currently the least automated
step in physical design and requires significant manual intervention, undermining
turn-around-time. The need for multi-objective optimization over a large parameter space
and the increasing impact of process variation make clock network synthesis particularly
challenging.
Our work identifies new objectives, constraints and concerns in the clock-network synthesis
for systems-on-chips and microprocessors. To address them, we generate novel
clock-network structures and propose changes in traditional physical-design flows. We
develop new modeling techniques and algorithms for clock power optimization subject
to tight skew constraints in the presence of process variations. In particular, we offer
SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below
5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while
tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we
propose new techniques and a methodology to reduce dynamic power consumption by
6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis
within global placement. We also present a novel non-tree topology that is 2.3x more
power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy
in a clock network to bridge the gap between tree-like and mesh-like topologies.
Integrated optimization techniques for high-quality clock networks described in this dissertation
strong empirical results in experiments with recent industry-released benchmarks
in the presence of process variation. Our software implementations were recognized with
the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests
organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd
Variation and power issues in VLSI clock networks
Clock Distribution Network (CDN) is an important component of any synchronous logic circuit. The function of CDN is to deliver the clock signal to the clock
sinks. Clock skew is defined as the difference in the arrival time of the clock signal at
the clock sinks. Higher uncertainty in skew (due to PVT variations) degrades circuit
performance by decreasing the maximum possible delay between any two sequential
elements. Aggressive frequency scaling has also led to high power consumption especially in CDN. This dissertation addresses variation and power issues in the design of
current and potential future CDN. The research detailed in this work presents algorithmic techniques for the following problems: (1) Variation tolerance in useful skew
design, (2) Link insertion for buffered clock nets, (3) Methodology and algorithms for
rotary clocking and (4) Clock mesh optimization for skew-power trade off.
For clock trees this dissertation presents techniques to integrate the different
aspects of clock tree synthesis (skew scheduling, abstract topology and layout embedding) into one framework- tolerance to variations. This research addresses the issues
involved in inserting cross-links in a buffered clock tree and proposes design criteria
to avoid the risk of short-circuit current. Rotary clocking is a promising new clocking
scheme that consists of unterminated rings formed by differential transmission lines.
Rotary clocking achieves reduction in power dissipation clock skew. This dissertation
addresses the issues in adopting current CAD methodology to rotary clocks. Alternative methodology and corresponding algorithmic techniques are detailed. Clock
mesh is a popular form of CDN used in high performance systems. The problem
of simultaneous sizing and placement of mesh buffers in a clock mesh is addressed.
The algorithms presented remove the edges from the clock mesh to trade off skew
tolerance for low power.
For clock trees as well as link insertion, our experiments indicate significant reduction in clock skew due to variations. For clock mesh, experimental results indicate
18.5% reduction in power with 1.3% delay penalty on a average. In summary, this dissertation details methodologies/algorithms that address two critical issues- variation
and power dissipation in current and potential future CDN
Design methodologies for variation-aware integrated circuits
The scaling of VLSI technology has spurred a rapid growth in the semiconductor
industry. With the CMOS device dimension scaling to and beyond 90nm technology,
it is possible to achieve higher performance and to pack more complex functionalities
on a single chip. However, the scaling trend has introduced drastic variation of
process and design parameters, leading to severe variability of chip performance in
nanometer regime. Also, the manufacturing community projects CMOS will scale for
three to four more generations. Since the uncertainties due to variations are expected
to increase in each generation, it will significantly impact the performance of design
and consequently the yield.
Another challenging issue in the nanometer IC design is the high power consumption
due to the greater packing density, higher frequency of operation and excessive
leakage power. Moreover, the circuits are usually over-designed to compensate for
uncertainties due to variations. The over-designed circuits not only make timing closure
difficult but also cause excessive power consumption. For portable electronics,
excessive power consumption may reduce battery life; for non-portable systems it
may impose great difficulties in cooling and packaging.
The objective of my research has been to develop design methodologies to address
variations and power dissipation for reliable circuit operation. The proposed work
has been divided into three parts: the first part addresses the issues related with
power/ground noise induced by clock distribution network and proposes techniques to reduce power/ground noise considering the effects of process variations. The second
part proposes an elastic pipeline scheme for random circuits with feedback loops. The
proposed scheme provides a low-power solution that has the same variation tolerance
as the conventional approaches. The third section deals with discrete buffer and wire
sizing for link-based non-tree clock network, which is an energy efficient structure for
skew tolerance to variations.
For the power/ground noise problem, our approach could reduce the peak current
and the delay variations by 50% and 51% respectively. Compared to conventional
approach, the elastic timing scheme reduces power dissipation by 20% − 27%. The
sizing method achieves clock skew reduction of 45% with a small increase in power
dissipation
Clock tree synthesis for prescribed skew specifications
In ultra-deep submicron VLSI designs, clock network layout plays an increasingly important role in determining circuit performance including timing, power consumption, cost, power supply noise and tolerance to process variations. It is required that a clock layout algorithm can achieve any prescribed skews with the minimum wire length and acceptable slew rate. Traditional zero-skew clock routing methods are not adequate to address this demand, since they tend to yield excessive wire length for prescribed skew targets. The interactions among skew targets, sink location proximities and capacitive load balance are analyzed. Based on this analysis, a maximum delay-target ordering merging scheme is suggested to minimize wire and buffer area, which results in lesser cost, power consumption and vulnerability to process variations. During the clock routing, buffers are inserted simultaneously to facilitate a proper slew rate level and reduce wire snaking. The proposed algorithm is simple and fast for practical applications. Experimental results on benchmark circuits show that the algorithm can reduce the total wire and buffer capacitance by 60% over an extension of the existing zero-skew routing method
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
Floorplan-Aware High Performance NoC Design
Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables.
En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip.
Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red.
En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci
On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report
For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices
The Scalability of Multicast Communication
Multicast is a communication method which operates on groups of applications. Having multiple instances of an application which are addressed collectively using a unique, multicast address, allows elegant solutions to some of the more intractable problems in distributed programming, such as providing fault tolerance. However, as multicast techniques are applied in areas such as distributed operating systems, where the operating system may span a large number of hosts, or on faster network architectures, where the problems of congestion reduce the effectiveness of the technique, then the scalability of multicast must be addressed if multicast is to gain a wider application. The main scalability issue was considered to be packet loss due to buffer overrun, the most common cause of this buffer overrun being the mismatch in packet arrival rate and packet consumption at the multicast originator, the so-called implosion problem. This issue affects positively acknowledged and transactional protocols. As these two techniques are the most common protocol designs, it was felt that an investigation into the problems of these types of protocol would be most effective. A model for implosion was developed which was simulated in order to investigate the parameters of implosion. A measure of this implosion was derived from the data, this index of implosion allowing the severity of implosion to be described as well as the location of the implosion in the model. This implosion index was derived by dividing the rate at which buffers were occupied by the rate at which packets were generated by the model. The value may then be used to predict the number of buffers required given the number of packets expected. A number of techniques were developed which may be used to offset implosion, either by artificially increasing the inter-packet gap, or by distributing replies so that no one host receives enough packets to cause an implosion. Of these alternatives, the latter offers the most promise, although requiring a large effort to maintain the resulting hierarchical structure in the presence of multiple failures
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
- …