62,539 research outputs found
Performance and power optimization in VLSI physical design
As VLSI technology enters the nanoscale regime, a great amount of efforts have
been made to reduce interconnect delay. Among them, buffer insertion stands out
as an effective technique for timing optimization. A dramatic rise in on-chip buffer
density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates
are buffers.
In this thesis, three buffer insertion algorithms are presented for the procedure
of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under
the dynamic programming framework and runs in provably linear time for multiple
buffer types due to two novel techniques: restrictive cost bucketing and efficient delay
update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution
quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter
time and the buffered tree has better timing.
The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce
via variation and signal distortion in twisted differential line. In addition, a new
buffer insertion technique is proposed to synchronize the transmitted signals, thus
further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new
approaches. In contrast, only a 100MHz signal can be reliably transmitted using a
single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45%
as witnessed in our simulation.
The fourth chapter proposes a buffer insertion and gate sizing algorithm for
million plus gates. The algorithm takes a combinational circuit as input instead of
individual nets and greatly reduces the buffer and gate cost of the entire circuit.
The algorithm has two main features: 1) A circuit partition technique based on the
criticality of the primary inputs, which provides the scalability for the algorithm, and
2) A linear programming formulation of non-linear delay versus cost tradeoff, which
formulates the simultaneous buffer insertion and gate sizing into linear programming
problem. Experimental results on ISCAS85 circuits show that even without the circuit
partition technique, the new algorithm achieves 17X speedup compared with path
based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9%
gate cost, 5.8% total cost and results in less circuit delay
Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls
We consider space-bounded computations on a random-access machine (RAM) where
the input is given on a read-only random-access medium, the output is to be
produced to a write-only sequential-access medium, and the available workspace
allows random reads and writes but is of limited capacity. The length of the
input is elements, the length of the output is limited by the computation,
and the capacity of the workspace is bits for some predetermined
parameter . We present a state-of-the-art priority queue---called an
adjustable navigation pile---for this restricted RAM model. Under some
reasonable assumptions, our priority queue supports and
in worst-case time and in worst-case time for any . We show how to use this
data structure to sort elements and to compute the convex hull of
points in the two-dimensional Euclidean space in
worst-case time for any . Following a known lower bound for the
space-time product of any branching program for finding unique elements, both
our sorting and convex-hull algorithms are optimal. The adjustable navigation
pile has turned out to be useful when designing other space-efficient
algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page
RAM-Efficient External Memory Sorting
In recent years a large number of problems have been considered in external
memory models of computation, where the complexity measure is the number of
blocks of data that are moved between slow external memory and fast internal
memory (also called I/Os). In practice, however, internal memory time often
dominates the total running time once I/O-efficiency has been obtained. In this
paper we study algorithms for fundamental problems that are simultaneously
I/O-efficient and internal memory efficient in the RAM model of computation.Comment: To appear in Proceedings of ISAAC 2013, getting the Best Paper Awar
Strengthened Lazy Heaps: Surpassing the Lower Bounds for Binary Heaps
Let denote the number of elements currently in a data structure. An
in-place heap is stored in the first locations of an array, uses
extra space, and supports the operations: minimum, insert, and extract-min. We
introduce an in-place heap, for which minimum and insert take worst-case
time, and extract-min takes worst-case time and involves at most
element comparisons. The achieved bounds are optimal to within
additive constant terms for the number of element comparisons. In particular,
these bounds for both insert and extract-min -and the time bound for insert-
surpass the corresponding lower bounds known for binary heaps, though our data
structure is similar. In a binary heap, when viewed as a nearly complete binary
tree, every node other than the root obeys the heap property, i.e. the element
at a node is not smaller than that at its parent. To surpass the lower bound
for extract-min, we reinforce a stronger property at the bottom levels of the
heap that the element at any right child is not smaller than that at its left
sibling. To surpass the lower bound for insert, we buffer insertions and allow
nodes to violate heap order in relation to their parents
- …