Search CORE

26,944 research outputs found

Fast interconnect optimization

Author: Li Zhuo
Publication venue: Texas A&M University
Publication date: 12/04/2006
Field of study

As the continuous trend of Very Large Scale Integration (VLSI) circuits technology scaling and frequency increases, delay optimization techniques for interconnect are increasingly important for achieving timing closure of high performance designs. For the gigahertz microprocessor and multi-million gate ASIC designs it is crucial to have fast algorithms in the design automation tools for many classical problems in the field to shorten time to market of the VLSI chip. This research presents algorithmic techniques and constructive models for two such problems: (1) Fast buffer insertion for delay optimization, (2) Wire sizing for delay optimization and variation minimization on non-tree networks. For the buffer insertion problem, this dissertation proposes several innovative speedup techniques for different problem formulations and the realistic requirement. For the basic buffer insertion problem, an O(n log2 n) optimal algorithm that runs much faster than the previous classical van GinnekenÂs O(n2) algorithm is proposed, where n is the number of buffer positions. For modern design libraries that contain hundreds of buffers, this research also proposes an optimal algorithm in O(bn2) time for b buffer types, a significant improvement over the previous O(b2n2) algorithm by Lillis, Cheng and Lin. For nets with small numbers of sinks and large numbers of buffer positions, a simple O(mn) optimal algorithm is proposed, where m is the number of sinks. For the buffer insertion with minimum cost problem, the problem is first proved to be NP-complete. Then several optimal and approximation techniques are proposed to further speed up the buffer insertion algorithm with resource control for big industrial designs. For the wire sizing problem, we propose a systematic method to size the wires of general non-tree RC networks. The new method can be used for delay optimization and variation reduction

CiteSeerX

Texas A&M Repository

Bottom-Up Approach for High Speed SRAM Word-line Buffer Insertion Optimization

Author: Guthaus Matthew R
Wu Bin
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Crossref

eScholarship - University of California

A Bulk-Parallel Priority Queue in External Memory with STXXL

Author: GS Brodal
J Singler
JS Vitter
L Arge
MC Pinotti
N Deo
P Sanders
P Sanders
PJ Varman
R Dementiev
Publication venue
Publication date: 01/01/2015
Field of study

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

arXiv.org e-Print Archive

Crossref

KITopen

Even faster sorting of (not only) integers

Author: C Hoare
D Knuth
D Musser
D Shell
J Shen
J Williams
M Codish
M Kokot
PM McIlroy
S Deorowicz
T Cormen
Publication venue
Publication date: 02/03/2017
Field of study

In this paper we introduce RADULS2, the fastest parallel sorter based on radix algorithm. It is optimized to process huge amounts of data making use of modern multicore CPUs. The main novelties include: extremely optimized algorithm for handling tiny arrays (up to about a hundred of records) that could appear even billions times as subproblems to handle and improved processing of larger subarrays with better use of non-temporal memory stores

arXiv.org e-Print Archive

Crossref

Strengthened Lazy Heaps: Surpassing the Lower Bounds for Binary Heaps

Author: Edelkamp Stefan
Elmasry Amr
Katajainen Jyrki
Publication venue
Publication date: 01/01/2014
Field of study

Let

n

denote the number of elements currently in a data structure. An in-place heap is stored in the first

n

locations of an array, uses

O(1)

extra space, and supports the operations: minimum, insert, and extract-min. We introduce an in-place heap, for which minimum and insert take

O(1)

worst-case time, and extract-min takes

O(\lg{} n)

worst-case time and involves at most

\lg{} n + O(1)

element comparisons. The achieved bounds are optimal to within additive constant terms for the number of element comparisons. In particular, these bounds for both insert and extract-min -and the time bound for insert- surpass the corresponding lower bounds known for binary heaps, though our data structure is similar. In a binary heap, when viewed as a nearly complete binary tree, every node other than the root obeys the heap property, i.e. the element at a node is not smaller than that at its parent. To surpass the lower bound for extract-min, we reinforce a stronger property at the bottom levels of the heap that the element at any right child is not smaller than that at its left sibling. To surpass the lower bound for insert, we buffer insertions and allow

O(\lg^2{} n)

nodes to violate heap order in relation to their parents

arXiv.org e-Print Archive

Copenhagen University Research Information System

RAM-Efficient External Memory Sorting

Author: A. Aggarwal
D. Comer
L. Arge
M. Thorup
R. Fadel
Publication venue
Publication date: 01/01/2013
Field of study

In recent years a large number of problems have been considered in external memory models of computation, where the complexity measure is the number of blocks of data that are moved between slow external memory and fast internal memory (also called I/Os). In practice, however, internal memory time often dominates the total running time once I/O-efficiency has been obtained. In this paper we study algorithms for fundamental problems that are simultaneously I/O-efficient and internal memory efficient in the RAM model of computation.Comment: To appear in Proceedings of ISAAC 2013, getting the Best Paper Awar

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System