Search CORE

8 research outputs found

Parallel processing can be harmful: The unusual behavior of interpolation search

Author: Reif John H.
Willard Dan E.
Publication venue: Academic Press, Inc. Published by Elsevier Inc.
Publication date: 30/06/1989
Field of study

AbstractSeveral articles have noted the usefulness of a retrieval algorithm called sequential interpolation search, and Yao and Yao have proven a lower bound log logN−O(1), showing this algorithm is actually optimal up to an additive constant on unindexed files of sizeNgenerated by the uniform probability distribution. We generalize the latter to show log logN− log logP−O(1) lower bounds the complexity of any retrieval algorithm withPparallel processors for searching an unindexed file of sizeN. This result is surprising because we also show how to obtain an upper bound that matches the lower bound up to an additive constant with a procedure that actually usesno parallel processingoutside its last iteration (at which time our proposal turns onPprocessors in parallel). Our first theorem therefore states thatparallel processing before the literally last iterationin the search of an unindexed ordered file hasnearly no usefulness. Two further surprising facts are that the preceding result holds even when communication between the parallel processing units involvesno delayand that the parallel algorithms are actuallyinherently slowerthan their sequential counterparts when each invocation of the SIMD machine invokes a communication step withany typeof nonzerodelay. The presentation in the first two chapters of this paper is quite informal, so that the reader can quickly grasp the underlying intuition

Elsevier - Publisher Connector

Random input helps searching predecessors

Author: Belazzougui D
Kaporis AC
Spirakis PG
Publication venue
Publication date: 01/01/2018
Field of study

A data structure problem consists of the finite sets: D of data, Q of queries, A of query answers, associated with a function f: D x Q → A. The data structure of file X is "static" ("dynamic") if we "do not" ("do") require quick updates as X changes. An important goal is to compactly encode a file X ϵ D, such that for each query y ϵ Q, function f (X, y) requires the minimum time to compute an answer in A. This goal is trivial if the size of D is large, since for each query y ϵ Q, it was shown that f(X,y) requires O(1) time for the most important queries in the literature. Hence, this goal becomes interesting to study as a trade off between the "storage space" and the "query time", both measured as functions of the file size n = \X\. The ideal solution would be to use linear O(n) = O(\X\) space, while retaining a constant O(1) query time. However, if f (X, y) computes the static predecessor search (find largest x ϵ X: x ≤ y), then Ajtai [Ajt88] proved a negative result. By using just n0(1) = [IX]0(1) data space, then it is not possible to evaluate f(X,y) in O(1) time Ay ϵ Q. The proof exhibited a bad distribution of data D, such that Ey∗ ϵ Q (a "difficult" query y∗), that f(X,y∗) requires ω(1) time. Essentially [Ajt88] is an existential result, resolving the worst case scenario. But, [Ajt88] left open the question: do we typically, that is, with high probability (w.h.p.)1 encounter such "difficult" queries y ϵ Q, when assuming reasonable distributions with respect to (w.r.t.) queries and data? Below we make reasonable assumptions w.r.t. the distribution of the queries y ϵ Q, as well as w.r.t. the distribution of data X ϵ D. In two interesting scenarios studied in the literature, we resolve the typical (w.h.p.) query time

University of Liverpool Repository

Suffix Arrays with a Twist

Author: Fredriksson Kimmo
Grabowski Szymon
Kowalski Tomasz M.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 01/08/2019
Field of study

The suffix array is a classic full-text index, combining effectiveness with simplicity. We discuss three approaches aiming to improve its efficiency even more: changes to the navigation, data layout and adding extra data. In short, we show that i) the way how we search for the right interval boundary impacts significantly the overall search speed, ii) a B-tree data layout easily wins over the standard one, iii) the well-known idea of a lookup table for the prefixes of the suffixes can be refined with using compression, iv) caching prefixes of the suffixes in a helper array can pose another practical space-time tradeoff

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Efficient Algorithms for Similarity and Skyline Summary on Multidimensional Datasets.

Author: Morse Michael David
Publication venue
Publication date: 01/01/2007
Field of study

Efficient management of large multidimensional datasets has attracted much attention in the database research community. Such large multidimensional datasets are common and efficient algorithms are needed for analyzing these data sets for a variety of applications. In this thesis, we focus our study on two very common classes of analysis: similarity and skyline summarization. We first focus on similarity when one of the dimensions in the multidimensional dataset is temporal. We then develop algorithms for evaluating skyline summaries effectively for both temporal and low-cardinality attribute domain datasets and propose different methods for improving the effectiveness of the skyline summary operation. This thesis begins by studying similarity measures for time-series datasets and efficient algorithms for time-series similarity evaluation. The first contribution of this thesis is a new algorithm which can be used to evaluate similarity methods whose matching criteria is bounded by a specified threshold value. The second contribution of this thesis is the development of a new time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution. The third contribution of this thesis is a novel technique called the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains. The utility of the skyline as a data summarization technique is often diminished by the volume of points in the skyline The final contribution of this thesis is a novel scheme which remedies the skyline volume problem by ranking the elements of the skyline based on their importance to the skyline summary. Collectively, the techniques described in this thesis present efficient methods for two common and computationally intensive analysis operations on large multidimensional datasets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57643/2/mmorse_1.pd

Deep Blue Documents at the University of Michigan

References, Appendices & All Parts Merged

Author: Likharev Konstantin
Publication venue: Academic Commons
Publication date: 01/01/2013
Field of study

Includes: Appendix MA: Selected Mathematical Formulas; Appendix CA: Selected Physical Constants; References; EGP merged file (all parts, appendices, and references)https://commons.library.stonybrook.edu/egp/1007/thumbnail.jp

Stony Brook University - SUNY