21,204 research outputs found
High Performance Issues in Image Processing and Computer Vision
Typical image processing and computer vision tasks found in industrial, medical, and military applications require real-time solutions. These requirements have motivated the design of many parallel architectures and algorithms. Recently, a new architecture called the reconfigurable mesh has been proposed. This thesis addresses a number of problems in image processing and computer vision on reconfigurable meshes.
We first show that a number of low-level descriptors of a digitized image such as the perimeter, area, histogram and median row can be reduced to computing the sum of all the integers in a matrix, which in turn can be reduced to computing the prefix sums of a binary sequence and the prefix sums of an integer sequence. We then propose a new computational paradigm for reconfigurable meshes, that is, identifying an entity by a bus and performing computations on the bus to obtain properties of the entity. Using the new paradigm, we solve a number of mid-level vision tasks including the Hough transform and component labeling. Finally, a VLSI-optimal constant time algorithm for computing the convex hull of a set of planar points is presented based on a VLSI-optimal constant time sorting algorithm.
As by-products, two basic data movement techniques, computing the prefix sums of a binary sequence and computing the prefix maxima of a sequence of real numbers, and a VLSI-optimal constant time sorting algorithm have been developed. These by-products are interesting in their own right. In addition, they can be exploited to obtain efficient algorithms for a number of computational problems
Optimistic Parallelization of Floating-Point Accumulation
Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks
On additive properties of sets defined by the Thue-Morse word
In this paper we study some additive properties of subsets of the set \nats
of positive integers: A subset of \nats is called {\it -summable}
(where k\in\ben) if contains \textstyle \big{\sum_{n\in F}x_n | \emp\neq
F\subseteq {1,2,...,k\} \big} for some -term sequence of natural numbers
. We say A \subseteq \nats is finite FS-big if is
-summable for each positive integer . We say is A \subseteq \nats is
infinite FS-big if for each positive integer contains {\sum_{n\in
F}x_n | \emp\neq F\subseteq \nats and #F\leq k} for some infinite sequence of
natural numbers . We say A\subseteq \nats is an IP-set if
contains {\sum_{n\in F}x_n | \emp\neq F\subseteq \nats and #F<\infty} for
some infinite sequence of natural numbers . By the Finite Sums
Theorem [5], the collection of all IP-sets is partition regular, i.e., if
is an IP-set then for any finite partition of , one cell of the partition is
an IP-set. Here we prove that the collection of all finite FS-big sets is also
partition regular. Let \TM =011010011001011010... denote the Thue-Morse word
fixed by the morphism and . For each factor of
\TM we consider the set \TM\big|_u\subseteq \nats of all occurrences of
in \TM. In this note we characterize the sets \TM\big|_u in terms of the
additive properties defined above. Using the Thue-Morse word we show that the
collection of all infinite FS-big sets is not partition regular
The Alternating Stock Size Problem and the Gasoline Puzzle
Given a set S of integers whose sum is zero, consider the problem of finding
a permutation of these integers such that: (i) all prefix sums of the ordering
are nonnegative, and (ii) the maximum value of a prefix sum is minimized.
Kellerer et al. referred to this problem as the "Stock Size Problem" and showed
that it can be approximated to within 3/2. They also showed that an
approximation ratio of 2 can be achieved via several simple algorithms.
We consider a related problem, which we call the "Alternating Stock Size
Problem", where the number of positive and negative integers in the input set S
are equal. The problem is the same as above, but we are additionally required
to alternate the positive and negative numbers in the output ordering. This
problem also has several simple 2-approximations. We show that it can be
approximated to within 1.79.
Then we show that this problem is closely related to an optimization version
of the gasoline puzzle due to Lov\'asz, in which we want to minimize the size
of the gas tank necessary to go around the track. We present a 2-approximation
for this problem, using a natural linear programming relaxation whose feasible
solutions are doubly stochastic matrices. Our novel rounding algorithm is based
on a transformation that yields another doubly stochastic matrix with special
properties, from which we can extract a suitable permutation
Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets
We consider the {\it indexable dictionary} problem, which consists of storing
a set for some integer , while supporting the
operations of \Rank(x), which returns the number of elements in that are
less than if , and -1 otherwise; and \Select(i) which returns
the -th smallest element in . We give a data structure that supports both
operations in O(1) time on the RAM model and requires bits to store a set of size , where {\cal B}(n,m) = \ceil{\lg
{m \choose n}} is the minimum number of bits required to store any -element
subset from a universe of size . Previous dictionaries taking this space
only supported (yes/no) membership queries in O(1) time. In the cell probe
model we can remove the additive term in the space bound,
answering a question raised by Fich and Miltersen, and Pagh.
We present extensions and applications of our indexable dictionary data
structure, including:
An information-theoretically optimal representation of a -ary cardinal
tree that supports standard operations in constant time,
A representation of a multiset of size from in bits that supports (appropriate generalizations of) \Rank
and \Select operations in constant time, and
A representation of a sequence of non-negative integers summing up to
in bits that supports prefix sum queries in constant
time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report
2002/1
Canonical Trees, Compact Prefix-free Codes and Sums of Unit Fractions: A Probabilistic Analysis
For fixed , we consider the class of representations of as sum of
unit fractions whose denominators are powers of or equivalently the class
of canonical compact -ary Huffman codes or equivalently rooted -ary plane
"canonical" trees. We study the probabilistic behaviour of the height (limit
distribution is shown to be normal), the number of distinct summands (normal
distribution), the path length (normal distribution), the width (main term of
the expectation and concentration property) and the number of leaves at maximum
distance from the root (discrete distribution)
Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation
Given a static reference string and a source string , a relative
compression of with respect to is an encoding of as a sequence of
references to substrings of . Relative compression schemes are a classic
model of compression and have recently proved very successful for compressing
highly-repetitive massive data sets such as genomes and web-data. We initiate
the study of relative compression in a dynamic setting where the compressed
source string is subject to edit operations. The goal is to maintain the
compressed representation compactly, while supporting edits and allowing
efficient random access to the (uncompressed) source string. We present new
data structures that achieve optimal time for updates and queries while using
space linear in the size of the optimal relative compression, for nearly all
combinations of parameters. We also present solutions for restricted and
extended sets of updates. To achieve these results, we revisit the dynamic
partial sums problem and the substring concatenation problem. We present new
optimal or near optimal bounds for these problems. Plugging in our new results
we also immediately obtain new bounds for the string indexing for patterns with
wildcards problem and the dynamic text and static pattern matching problem
Distributed Sparse Cut Approximation
We study the problem of computing a sparse cut in an undirected network graph G=(V,E). We measure the sparsity of a cut (S,VS) by its conductance phi(S), i.e., by the ratio of the number of edges crossing the cut and the sum of the degrees on the smaller of the two sides. We present an efficient distributed algorithm to compute a cut of low conductance. Specifically, given two parameters b and phi, if there exists a cut of balance at least b and conductance at most phi, our algorithm outputs a cut of balance at least b/2 and conductance at most ~O(sqrt{phi}), where ~O(.) hides polylogarithmic factors in the number of nodes n. Our distributed algorithm works in the congest model, i.e., it only requires to send messages of size at most O(log(n)) bits. The time complexity of the algorithm is ~O(D + 1/b*phi), where D is the diameter of G. This is a significant improvement over a result by Das Sarma et al. [ICDCN 2015], where it is shown that a cut of the same quality can be computed in time ~O(n + 1/b*phi). The improved running time is in particular achieved by devising and applying an efficient distributed algorithm for the all-prefix-sums problem in a distributed search tree. This algorithm, which is based on the classic parallel all-prefix-sums algorithm, might be of independent interest
- …