21,204 research outputs found

    High Performance Issues in Image Processing and Computer Vision

    Get PDF
    Typical image processing and computer vision tasks found in industrial, medical, and military applications require real-time solutions. These requirements have motivated the design of many parallel architectures and algorithms. Recently, a new architecture called the reconfigurable mesh has been proposed. This thesis addresses a number of problems in image processing and computer vision on reconfigurable meshes. We first show that a number of low-level descriptors of a digitized image such as the perimeter, area, histogram and median row can be reduced to computing the sum of all the integers in a matrix, which in turn can be reduced to computing the prefix sums of a binary sequence and the prefix sums of an integer sequence. We then propose a new computational paradigm for reconfigurable meshes, that is, identifying an entity by a bus and performing computations on the bus to obtain properties of the entity. Using the new paradigm, we solve a number of mid-level vision tasks including the Hough transform and component labeling. Finally, a VLSI-optimal constant time algorithm for computing the convex hull of a set of planar points is presented based on a VLSI-optimal constant time sorting algorithm. As by-products, two basic data movement techniques, computing the prefix sums of a binary sequence and computing the prefix maxima of a sequence of real numbers, and a VLSI-optimal constant time sorting algorithm have been developed. These by-products are interesting in their own right. In addition, they can be exploited to obtain efficient algorithms for a number of computational problems

    Optimistic Parallelization of Floating-Point Accumulation

    Get PDF
    Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

    On additive properties of sets defined by the Thue-Morse word

    Full text link
    In this paper we study some additive properties of subsets of the set \nats of positive integers: A subset AA of \nats is called {\it kk-summable} (where k\in\ben) if AA contains \textstyle \big{\sum_{n\in F}x_n | \emp\neq F\subseteq {1,2,...,k\} \big} for some kk-term sequence of natural numbers x1<x2<...<xkx_1<x_2 < ... < x_k. We say A \subseteq \nats is finite FS-big if AA is kk-summable for each positive integer kk. We say is A \subseteq \nats is infinite FS-big if for each positive integer k,k, AA contains {\sum_{n\in F}x_n | \emp\neq F\subseteq \nats and #F\leq k} for some infinite sequence of natural numbers x1<x2<...x_1<x_2 < ... . We say A\subseteq \nats is an IP-set if AA contains {\sum_{n\in F}x_n | \emp\neq F\subseteq \nats and #F<\infty} for some infinite sequence of natural numbers x1<x2<...x_1<x_2 < ... . By the Finite Sums Theorem [5], the collection of all IP-sets is partition regular, i.e., if AA is an IP-set then for any finite partition of AA, one cell of the partition is an IP-set. Here we prove that the collection of all finite FS-big sets is also partition regular. Let \TM =011010011001011010... denote the Thue-Morse word fixed by the morphism 0010\mapsto 01 and 1101\mapsto 10. For each factor uu of \TM we consider the set \TM\big|_u\subseteq \nats of all occurrences of uu in \TM. In this note we characterize the sets \TM\big|_u in terms of the additive properties defined above. Using the Thue-Morse word we show that the collection of all infinite FS-big sets is not partition regular

    The Alternating Stock Size Problem and the Gasoline Puzzle

    Full text link
    Given a set S of integers whose sum is zero, consider the problem of finding a permutation of these integers such that: (i) all prefix sums of the ordering are nonnegative, and (ii) the maximum value of a prefix sum is minimized. Kellerer et al. referred to this problem as the "Stock Size Problem" and showed that it can be approximated to within 3/2. They also showed that an approximation ratio of 2 can be achieved via several simple algorithms. We consider a related problem, which we call the "Alternating Stock Size Problem", where the number of positive and negative integers in the input set S are equal. The problem is the same as above, but we are additionally required to alternate the positive and negative numbers in the output ordering. This problem also has several simple 2-approximations. We show that it can be approximated to within 1.79. Then we show that this problem is closely related to an optimization version of the gasoline puzzle due to Lov\'asz, in which we want to minimize the size of the gas tank necessary to go around the track. We present a 2-approximation for this problem, using a natural linear programming relaxation whose feasible solutions are doubly stochastic matrices. Our novel rounding algorithm is based on a transformation that yields another doubly stochastic matrix with special properties, from which we can extract a suitable permutation

    Succinct Indexable Dictionaries with Applications to Encoding kk-ary Trees, Prefix Sums and Multisets

    Full text link
    We consider the {\it indexable dictionary} problem, which consists of storing a set S{0,...,m1}S \subseteq \{0,...,m-1\} for some integer mm, while supporting the operations of \Rank(x), which returns the number of elements in SS that are less than xx if xSx \in S, and -1 otherwise; and \Select(i) which returns the ii-th smallest element in SS. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+o(n)+O(lglgm){\cal B}(n,m) + o(n) + O(\lg \lg m) bits to store a set of size nn, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any nn-element subset from a universe of size mm. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lglgm)O(\lg \lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a kk-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size nn from {0,...,m1}\{0,...,m-1\} in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of nn non-negative integers summing up to mm in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

    Canonical Trees, Compact Prefix-free Codes and Sums of Unit Fractions: A Probabilistic Analysis

    Get PDF
    For fixed t2t\ge 2, we consider the class of representations of 11 as sum of unit fractions whose denominators are powers of tt or equivalently the class of canonical compact tt-ary Huffman codes or equivalently rooted tt-ary plane "canonical" trees. We study the probabilistic behaviour of the height (limit distribution is shown to be normal), the number of distinct summands (normal distribution), the path length (normal distribution), the width (main term of the expectation and concentration property) and the number of leaves at maximum distance from the root (discrete distribution)

    Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

    Get PDF
    Given a static reference string RR and a source string SS, a relative compression of SS with respect to RR is an encoding of SS as a sequence of references to substrings of RR. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string SS is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

    Distributed Sparse Cut Approximation

    Get PDF
    We study the problem of computing a sparse cut in an undirected network graph G=(V,E). We measure the sparsity of a cut (S,VS) by its conductance phi(S), i.e., by the ratio of the number of edges crossing the cut and the sum of the degrees on the smaller of the two sides. We present an efficient distributed algorithm to compute a cut of low conductance. Specifically, given two parameters b and phi, if there exists a cut of balance at least b and conductance at most phi, our algorithm outputs a cut of balance at least b/2 and conductance at most ~O(sqrt{phi}), where ~O(.) hides polylogarithmic factors in the number of nodes n. Our distributed algorithm works in the congest model, i.e., it only requires to send messages of size at most O(log(n)) bits. The time complexity of the algorithm is ~O(D + 1/b*phi), where D is the diameter of G. This is a significant improvement over a result by Das Sarma et al. [ICDCN 2015], where it is shown that a cut of the same quality can be computed in time ~O(n + 1/b*phi). The improved running time is in particular achieved by devising and applying an efficient distributed algorithm for the all-prefix-sums problem in a distributed search tree. This algorithm, which is based on the classic parallel all-prefix-sums algorithm, might be of independent interest
    corecore