26,944 research outputs found
Fast interconnect optimization
As the continuous trend of Very Large Scale Integration (VLSI) circuits technology
scaling and frequency increases, delay optimization techniques for interconnect
are increasingly important for achieving timing closure of high performance designs.
For the gigahertz microprocessor and multi-million gate ASIC designs it is crucial to
have fast algorithms in the design automation tools for many classical problems in
the field to shorten time to market of the VLSI chip. This research presents algorithmic
techniques and constructive models for two such problems: (1) Fast buffer
insertion for delay optimization, (2) Wire sizing for delay optimization and variation
minimization on non-tree networks.
For the buffer insertion problem, this dissertation proposes several innovative
speedup techniques for different problem formulations and the realistic requirement.
For the basic buffer insertion problem, an O(n log2 n) optimal algorithm that runs
much faster than the previous classical van GinnekenÂs O(n2) algorithm is proposed,
where n is the number of buffer positions. For modern design libraries that contain
hundreds of buffers, this research also proposes an optimal algorithm in O(bn2) time
for b buffer types, a significant improvement over the previous O(b2n2) algorithm
by Lillis, Cheng and Lin. For nets with small numbers of sinks and large numbers
of buffer positions, a simple O(mn) optimal algorithm is proposed, where m is the
number of sinks. For the buffer insertion with minimum cost problem, the problem is first proved to be NP-complete. Then several optimal and approximation techniques
are proposed to further speed up the buffer insertion algorithm with resource control
for big industrial designs.
For the wire sizing problem, we propose a systematic method to size the wires of
general non-tree RC networks. The new method can be used for delay optimization
and variation reduction
A Bulk-Parallel Priority Queue in External Memory with STXXL
We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
Even faster sorting of (not only) integers
In this paper we introduce RADULS2, the fastest parallel sorter based on
radix algorithm. It is optimized to process huge amounts of data making use of
modern multicore CPUs. The main novelties include: extremely optimized
algorithm for handling tiny arrays (up to about a hundred of records) that
could appear even billions times as subproblems to handle and improved
processing of larger subarrays with better use of non-temporal memory stores
Strengthened Lazy Heaps: Surpassing the Lower Bounds for Binary Heaps
Let denote the number of elements currently in a data structure. An
in-place heap is stored in the first locations of an array, uses
extra space, and supports the operations: minimum, insert, and extract-min. We
introduce an in-place heap, for which minimum and insert take worst-case
time, and extract-min takes worst-case time and involves at most
element comparisons. The achieved bounds are optimal to within
additive constant terms for the number of element comparisons. In particular,
these bounds for both insert and extract-min -and the time bound for insert-
surpass the corresponding lower bounds known for binary heaps, though our data
structure is similar. In a binary heap, when viewed as a nearly complete binary
tree, every node other than the root obeys the heap property, i.e. the element
at a node is not smaller than that at its parent. To surpass the lower bound
for extract-min, we reinforce a stronger property at the bottom levels of the
heap that the element at any right child is not smaller than that at its left
sibling. To surpass the lower bound for insert, we buffer insertions and allow
nodes to violate heap order in relation to their parents
RAM-Efficient External Memory Sorting
In recent years a large number of problems have been considered in external
memory models of computation, where the complexity measure is the number of
blocks of data that are moved between slow external memory and fast internal
memory (also called I/Os). In practice, however, internal memory time often
dominates the total running time once I/O-efficiency has been obtained. In this
paper we study algorithms for fundamental problems that are simultaneously
I/O-efficient and internal memory efficient in the RAM model of computation.Comment: To appear in Proceedings of ISAAC 2013, getting the Best Paper Awar
- …