342 research outputs found
A lower bound for linear approximate compaction
The {\em -approximate compaction} problem is: given an input array of values, each either 0 or 1, place each value in an output array so that all the 1's are in the first array locations, where is the number of 1's in the input. is an accuracy parameter. This problem is of fundamental importance in parallel computation because of its applications to processor allocation and approximate counting. When is a constant, the problem is called {\em Linear Approximate Compaction} (LAC). On the CRCW PRAM model, %there is an algorithm that solves approximate compaction in \order{(\log\log n)^3} time for , using processors. Our main result shows that this is close to the best possible. Specifically, we prove that LAC requires % time using \order{n} processors. We also give a tradeoff between and the processing time. For , and , the time required is
On a compaction theorem of ragde
Ragde demonstrated that in constant time a PRAM with processors can move at most items, stored in distinct cells of an array of size , to distinct cells in an array of size at most . We show that the exponent of 4 in the preceding sentence can be replaced by any constant greater than~2
Fast deterministic processor allocation
Interval allocation has been suggested as a possible formalization for the PRAM of the (vaguely defined) processor allocation problem, which is of fundamental importance in parallel computing. The interval allocation problem is, given nonnegative integers , to allocate nonoverlapping subarrays of sizes from within a base array of cells. We show that interval allocation problems of size can be solved in time with optimal speedup on a deterministic CRCW PRAM. In addition to a general solution to the processor allocation problem, this implies an improved deterministic algorithm for the problem of approximate summation. For both interval allocation and approximate summation, the fastest previous deterministic algorithms have running times of . We also describe an application to the problem of computing the connected components of an undirected graph
Compiling for an Heterogeneous Vector Image Processor
International audienceWe present a new compilation strategy, implemented at a small cost, to optimize image applications developed on top of a high level image processing library for an heterogeneous processor with a vector image processing accelerator. The library provides the semantics of the image computations. The pipelined structure of the accelerator allows to compute whole expressions with dozens of elementary image instructions, but is constrained as intermediate image values cannot be extracted. We adapted standard compilation techniques to perform this task automatically. Our strategy is implemented in PIPS, a source-to-source compiler which greatly reduces the development cost as standard phases are reused and parameterized for the target. Experiments were run on the hardware functional simulator. We compile 1217 cases, from elementary tests to full applications. All are optimal but a few which are mostly within a mere accelerator call of optimality. Our contribu- tions include: 1) a general low cost compilation strategy for image processing applications, based on the semantics provided by library calls, which improves locality by an order of magnitude; 2) a specific heuristic to minimize execution time on the target vector accelerator; 3) numerous experiments that show the effectiveness of our strategy
An Arbitrary CRCW PRAM Algorithm for Sorting Integers Into a LinkedList and Chaining on a Trie
Title from PDF of title page viewed June 1, 2020Thesis advisor: Yijie HanVitaIncludes bibliographical references (pages 22-23)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020The research work comprises of two parts. Part one is using an Arbitrary CRCW PRAM algorithm for sorting integers into a linked list. There are various algorithms and techniques to sort the integers in LinkedList. Arbitrary CRCW PRAM model, being the weakest model is able to sort n integers in a LinkedList in “constant time” using nlogm processors and if we use nt processors, then it can be sorted in O(loglogm/logt) time by converting Arbitrary CRCW PRAM model to Priority CRCW PRAM model.
Part two is Chaining on a Trie. This research paper solves the problem of chaining on a Trie by providing more efficient complexity. This Algorithm takes “constant time” using n(logm+1) processors to chain the nodes on a Trie for n input integers on the Arbitrary CRCW PRAM model.Introduction -- Sort integers into a linked list -- Chaining on a Trie --Conclusio
- …