63,074 research outputs found

    Analysis of An Approximate Median Selection Algorithm

    Get PDF
    We present analysis of an efficient algorithm for the approximate median selection problem that has been rediscovered many times, and easy to implement. The contribution of the article is in precise characterization of the accuracy of the algorithm. We present analytical results of the performance of the algorithm, as well as experimental illustrations of its precision

    Optimal Gossip Algorithms for Exact and Approximate Quantile Computations

    Full text link
    This paper gives drastically faster gossip algorithms to compute exact and approximate quantiles. Gossip algorithms, which allow each node to contact a uniformly random other node in each round, have been intensely studied and been adopted in many applications due to their fast convergence and their robustness to failures. Kempe et al. [FOCS'03] gave gossip algorithms to compute important aggregate statistics if every node is given a value. In particular, they gave a beautiful O(logn+log1ϵ)O(\log n + \log \frac{1}{\epsilon}) round algorithm to ϵ\epsilon-approximate the sum of all values and an O(log2n)O(\log^2 n) round algorithm to compute the exact ϕ\phi-quantile, i.e., the the ϕn\lceil \phi n \rceil smallest value. We give an quadratically faster and in fact optimal gossip algorithm for the exact ϕ\phi-quantile problem which runs in O(logn)O(\log n) rounds. We furthermore show that one can achieve an exponential speedup if one allows for an ϵ\epsilon-approximation. We give an O(loglogn+log1ϵ)O(\log \log n + \log \frac{1}{\epsilon}) round gossip algorithm which computes a value of rank between ϕn\phi n and (ϕ+ϵ)n(\phi+\epsilon)n at every node.% for any 0ϕ10 \leq \phi \leq 1 and 0<ϵ<10 < \epsilon < 1. Our algorithms are extremely simple and very robust - they can be operated with the same running times even if every transmission fails with a, potentially different, constant probability. We also give a matching Ω(loglogn+log1ϵ)\Omega(\log \log n + \log \frac{1}{\epsilon}) lower bound which shows that our algorithm is optimal for all values of ϵ\epsilon

    Fast Deterministic Selection

    Get PDF
    The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.Comment: Pre-publication draf

    SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

    Full text link
    This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a globally optimal subset using the branch and bound (BB) algorithm is limited to problems in very low dimension, tipically d<5, as the complexity of the problem increases exponentially with d. We introduce a bold pruning strategy in the BB algorithm that results in a significant reduction in computing time, at the price of a negligeable accuracy lost. The novelty of our algorithm is that the bounds at nodes of the BB tree come from pseudo-convexifications derived using a linearization technique with approximate bounds for the nonlinear terms. The approximate bounds are computed solving an auxiliary semidefinite optimization problem. We show through a computational study that our algorithm performs well in a wide set of the most difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table

    Linear-Space Data Structures for Range Mode Query in Arrays

    Full text link
    A mode of a multiset SS is an element aSa \in S of maximum multiplicity; that is, aa occurs at least as frequently as any other element in SS. Given a list A[1:n]A[1:n] of nn items, we consider the problem of constructing a data structure that efficiently answers range mode queries on AA. Each query consists of an input pair of indices (i,j)(i, j) for which a mode of A[i:j]A[i:j] must be returned. We present an O(n22ϵ)O(n^{2-2\epsilon})-space static data structure that supports range mode queries in O(nϵ)O(n^\epsilon) time in the worst case, for any fixed ϵ[0,1/2]\epsilon \in [0,1/2]. When ϵ=1/2\epsilon = 1/2, this corresponds to the first linear-space data structure to guarantee O(n)O(\sqrt{n}) query time. We then describe three additional linear-space data structures that provide O(k)O(k), O(m)O(m), and O(ji)O(|j-i|) query time, respectively, where kk denotes the number of distinct elements in AA and mm denotes the frequency of the mode of AA. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
    corecore