4,744 research outputs found

    Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls

    Get PDF
    We consider space-bounded computations on a random-access machine (RAM) where the input is given on a read-only random-access medium, the output is to be produced to a write-only sequential-access medium, and the available workspace allows random reads and writes but is of limited capacity. The length of the input is NN elements, the length of the output is limited by the computation, and the capacity of the workspace is O(S)O(S) bits for some predetermined parameter SS. We present a state-of-the-art priority queue---called an adjustable navigation pile---for this restricted RAM model. Under some reasonable assumptions, our priority queue supports minimum\mathit{minimum} and insert\mathit{insert} in O(1)O(1) worst-case time and extract\mathit{extract} in O(N/S+lgS)O(N/S + \lg{} S) worst-case time for any SlgNS \geq \lg{} N. We show how to use this data structure to sort NN elements and to compute the convex hull of NN points in the two-dimensional Euclidean space in O(N2/S+NlgS)O(N^2/S + N \lg{} S) worst-case time for any SlgNS \geq \lg{} N. Following a known lower bound for the space-time product of any branching program for finding unique elements, both our sorting and convex-hull algorithms are optimal. The adjustable navigation pile has turned out to be useful when designing other space-efficient algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page

    Tight Lower Bound for Comparison-Based Quantile Summaries

    Get PDF
    Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most ε\varepsilon. That is, an ε\varepsilon-approximate quantile summary first processes a stream of items and then, given any quantile query 0ϕ10\le \phi\le 1, returns an item from the stream, which is a ϕ\phi'-quantile for some ϕ=ϕ±ε\phi' = \phi \pm \varepsilon. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most O(1εlogεN)O(\frac{1}{\varepsilon}\cdot \log \varepsilon N) items, where NN is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)o(logN)f(\varepsilon)\cdot o(\log N), for any function ff that does not depend on NN. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1±ε)ϕ(1\pm \varepsilon)\cdot \phi, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

    Sequential Quantiles via Hermite Series Density Estimation

    Full text link
    Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments, as appears in Electronic Journal of Statistic

    Comment: Monitoring Networked Applications With Incremental Quantile Estimation

    Full text link
    Our comments are in two parts. First, we make some observations regarding the methodology in Chambers et al. [arXiv:0708.0302]. Second, we briefly describe another interesting network monitoring problem that arises in the context of assessing quality of service, such as loss rates and delay distributions, in packet-switched networks.Comment: Published at http://dx.doi.org/10.1214/088342306000000600 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Selection from read-only memory with limited workspace

    Full text link
    Given an unordered array of NN elements drawn from a totally ordered set and an integer kk in the range from 11 to NN, in the classic selection problem the task is to find the kk-th smallest element in the array. We study the complexity of this problem in the space-restricted random-access model: The input array is stored on read-only memory, and the algorithm has access to a limited amount of workspace. We prove that the linear-time prune-and-search algorithm---presented in most textbooks on algorithms---can be modified to use Θ(N)\Theta(N) bits instead of Θ(N)\Theta(N) words of extra space. Prior to our work, the best known algorithm by Frederickson could perform the task with Θ(N)\Theta(N) bits of extra space in O(NlgN)O(N \lg^{*} N) time. Our result separates the space-restricted random-access model and the multi-pass streaming model, since we can surpass the Ω(NlgN)\Omega(N \lg^{*} N) lower bound known for the latter model. We also generalize our algorithm for the case when the size of the workspace is Θ(S)\Theta(S) bits, where lg3NSN\lg^3{N} \leq S \leq N. The running time of our generalized algorithm is O(Nlg(N/S)+N(lgN)/lgS)O(N \lg^{*}(N/S) + N (\lg N) / \lg{} S), slightly improving over the O(Nlg(N(lgN)/S)+N(lgN)/lgS)O(N \lg^{*}(N (\lg N)/S) + N (\lg N) / \lg{} S) bound of Frederickson's algorithm. To obtain the improvements mentioned above, we developed a new data structure, called the wavelet stack, that we use for repeated pruning. We expect the wavelet stack to be a useful tool in other applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201
    corecore