1,458 research outputs found
Recommended from our members
An intelligent component database for behavioral synthesis
This paper describes an intelligent component database system that delivers components to synthesis tools when given a set of attributes and constraints. Requirements of a component server are defined and an implementation is described. Our experiments demonstrate that such a component sever can replace component libraries and component catalogs with hundreds of pages
Elasticity and Petri nets
Digital electronic systems typically use synchronous clocks and primarily assume fixed duration of their operations to simplify the design process. Time elastic systems can be constructed either by replacing the clock with communication handshakes (asynchronous version) or by augmenting the clock with a synchronous version of a handshake (synchronous version). Time elastic systems can tolerate static and dynamic changes in delays (asynchronous case) or latencies (synchronous case) of operations that can be used for modularity, ease of reuse and better power-delay trade-off. This paper describes methods for the modeling, performance analysis and optimization of elastic systems using Marked Graphs and their extensions capable of describing behavior with early evaluation. The paper uses synchronous elastic systems (aka latency-tolerant systems) for illustrating the use of Petri nets, however, most of the methods can be applied without changes (except changing the delay model associated with events of the system) to asynchronous elastic systems.Peer ReviewedPostprint (author's final draft
Design methodologies for variation-aware integrated circuits
The scaling of VLSI technology has spurred a rapid growth in the semiconductor
industry. With the CMOS device dimension scaling to and beyond 90nm technology,
it is possible to achieve higher performance and to pack more complex functionalities
on a single chip. However, the scaling trend has introduced drastic variation of
process and design parameters, leading to severe variability of chip performance in
nanometer regime. Also, the manufacturing community projects CMOS will scale for
three to four more generations. Since the uncertainties due to variations are expected
to increase in each generation, it will significantly impact the performance of design
and consequently the yield.
Another challenging issue in the nanometer IC design is the high power consumption
due to the greater packing density, higher frequency of operation and excessive
leakage power. Moreover, the circuits are usually over-designed to compensate for
uncertainties due to variations. The over-designed circuits not only make timing closure
difficult but also cause excessive power consumption. For portable electronics,
excessive power consumption may reduce battery life; for non-portable systems it
may impose great difficulties in cooling and packaging.
The objective of my research has been to develop design methodologies to address
variations and power dissipation for reliable circuit operation. The proposed work
has been divided into three parts: the first part addresses the issues related with
power/ground noise induced by clock distribution network and proposes techniques to reduce power/ground noise considering the effects of process variations. The second
part proposes an elastic pipeline scheme for random circuits with feedback loops. The
proposed scheme provides a low-power solution that has the same variation tolerance
as the conventional approaches. The third section deals with discrete buffer and wire
sizing for link-based non-tree clock network, which is an energy efficient structure for
skew tolerance to variations.
For the power/ground noise problem, our approach could reduce the peak current
and the delay variations by 50% and 51% respectively. Compared to conventional
approach, the elastic timing scheme reduces power dissipation by 20% − 27%. The
sizing method achieves clock skew reduction of 45% with a small increase in power
dissipation
Physical design algorithms for asynchronous circuits
Asynchronous designs have been demonstrated to be able to achieve both higher performance and lower power compared with their synchronous counterparts. It provides a very promising solution to the emerging challenges in advanced technology. However, due to the lack of proper EDA tool support, the design cycle for asynchronous circuits is much longer compared with the one for synchronous circuits. Thus, even with many advantages, asynchronous circuits are still not the mainstream in the industry. In this thesis, we provides several algorithms to resolve the emerging issues for the physical design of asynchronous circuits. Our proposed algorithms optimize asynchronous circuits using placement, gate sizing, repeater insertion and pipeline buffer insertion techniques. An incremental maximum cycle ratio algorithm is also proposed to speed up the timing analysis of asynchronous circuits
Recommended from our members
Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems
Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions
HIGH PERFORMANCE CLOCK DISTRIBUTION FOR HIGH-SPEED VLSI SYSTEMS
Tohoku University堀口 進課
Clock Distribution Network Optimization by Sequential Quadratic Programing
Clock mesh is widely used in microprocessor designs for achieving low clock
skew and high process variation tolerance. Clock mesh optimization is a very diffcult
problem to solve because it has a highly connected structure and requires accurate
delay models which are computationally expensive.
Existing methods on clock network optimization are either restricted to clock
trees, which are easy to be separated into smaller problems, or naive heuristics based
on crude delay models.
A clock mesh sizing algorithm, which is aimed to minimize total mesh wire area
with consideration of clock skew constraints, has been proposed in this research work.
This algorithm is a systematic solution search through rigorous Sequential Quadratic
Programming (SQP). The SQP is guided by an efficient adjoint sensitivity analysis
which has near-SPICE(Simulation Program for Integrated Circuits Emphasis)-level
accuracy and faster-than-SPICE speed.
Experimental results on various benchmark circuits indicate that this algorithm
leads to substantial wire area reduction while maintaining low clock skew in the clock
mesh. The reduction in mesh area achieved is about 33%
Concurrent optimization strategies for high-performance VLSI circuits
In the next generation of VLSI circuits, concurrent optimizations will be essential to achieve the performance challenges. In this dissertation, we present techniques for combining traditional timing optimization techniques to achieve a superior performance;The method of buffer insertion is used in timing optimization to either increase the driving power of a path in a circuit, or to isolate large capacitive loads that lie on noncritical or less critical paths. The procedure of transistor sizing selects the sizes of transistors within a circuit to achieve a given timing specification. Traditional design techniques perform these two optimizations as independent steps during synthesis, even though they are intimately linked and performing them in alternating steps is liable to lead to suboptimal solutions. The first part of this thesis presents a new approach for unifying transistor sizing with buffer insertion. Our algorithm achieve from 5% to 49% area reduction compared with the results of a standard transistor sizing algorithm;The next part of the thesis deals with the problem of collapsing gates for technology mapping. Two new techniques are proposed. The first method, the odd-level transistor replacement (OTR) method, performs technology mapping without the restriction of a fixed library size, and maps a circuit to a virtual library of complex static CMOS gates. The second technique, the Static CMOS/PTL method, uses a mix of static CMOS and pass transistor logic (PTL) to realize the circuit, using the relation between PTL and binary decision diagrams. The methods are very efficient and can handle all ISCAS\u2785 benchmark circuits in minutes. On average, it was found that the OTR method gave 40%, and the Static/PTL gave 50% delay reductions over SIS, with substantial area savings;Finally, we extend the technology mapping work to interleave it with placement in a single optimization. Conventional methods that perform these steps separately will not be adequate for next-generation circuits. Our approach presents an integrated solution to this problem, and shows an average of 28.19%, and a maximum of 78.42% improvement in the delay over a method that performs the two optimizations in separate steps
Variation and power issues in VLSI clock networks
Clock Distribution Network (CDN) is an important component of any synchronous logic circuit. The function of CDN is to deliver the clock signal to the clock
sinks. Clock skew is defined as the difference in the arrival time of the clock signal at
the clock sinks. Higher uncertainty in skew (due to PVT variations) degrades circuit
performance by decreasing the maximum possible delay between any two sequential
elements. Aggressive frequency scaling has also led to high power consumption especially in CDN. This dissertation addresses variation and power issues in the design of
current and potential future CDN. The research detailed in this work presents algorithmic techniques for the following problems: (1) Variation tolerance in useful skew
design, (2) Link insertion for buffered clock nets, (3) Methodology and algorithms for
rotary clocking and (4) Clock mesh optimization for skew-power trade off.
For clock trees this dissertation presents techniques to integrate the different
aspects of clock tree synthesis (skew scheduling, abstract topology and layout embedding) into one framework- tolerance to variations. This research addresses the issues
involved in inserting cross-links in a buffered clock tree and proposes design criteria
to avoid the risk of short-circuit current. Rotary clocking is a promising new clocking
scheme that consists of unterminated rings formed by differential transmission lines.
Rotary clocking achieves reduction in power dissipation clock skew. This dissertation
addresses the issues in adopting current CAD methodology to rotary clocks. Alternative methodology and corresponding algorithmic techniques are detailed. Clock
mesh is a popular form of CDN used in high performance systems. The problem
of simultaneous sizing and placement of mesh buffers in a clock mesh is addressed.
The algorithms presented remove the edges from the clock mesh to trade off skew
tolerance for low power.
For clock trees as well as link insertion, our experiments indicate significant reduction in clock skew due to variations. For clock mesh, experimental results indicate
18.5% reduction in power with 1.3% delay penalty on a average. In summary, this dissertation details methodologies/algorithms that address two critical issues- variation
and power dissipation in current and potential future CDN
Algorithmic techniques for nanometer VLSI design and manufacturing closure
As Very Large Scale Integration (VLSI) technology moves to the nanoscale
regime, design and manufacturing closure becomes very difficult to achieve due to
increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of
individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's
law as well as the growth of semiconductor industry.
Efforts are needed in both deterministic design stage and variation-aware design
stage. This research proposes various innovative algorithms to address both stages for
obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For
variation-aware optimizations, new lithography-driven and post-silicon tuning-driven
design techniques are proposed.
For buffer insertion, a new slew buffering formulation is presented and is proved
to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster
than the best alternatives is proposed. The algorithm is also extended to handle
continuous buffer locations and blockages.
For gate sizing, a new algorithm is proposed to handle discrete gate library in
contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which
integrates the high solution quality of dynamic programming with the short runtime
of rounding continuous solution.
For lithography-driven optimization, the problem of cell placement considering
manufacturability is studied. Three algorithms are proposed to handle cell flipping
and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire-
length increase.
For post-silicon tuning-driven optimization, the problem of unified adaptivity
optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming
formulation which is solved by an advanced robust linear programming technique.
The continuous solution is then discretized using binary search accelerated dynamic
programming, batch based optimization, and Latin Hypercube sampling based fast
simulation
- …