16 research outputs found

    Efficient Algorithms for the Closest Pair Problem and Applications

    Full text link
    The closest pair problem (CPP) is one of the well studied and fundamental problems in computing. Given a set of points in a metric space, the problem is to identify the pair of closest points. Another closely related problem is the fixed radius nearest neighbors problem (FRNNP). Given a set of points and a radius RR, the problem is, for every input point pp, to identify all the other input points that are within a distance of RR from pp. A naive deterministic algorithm can solve these problems in quadratic time. CPP as well as FRNNP play a vital role in computational biology, computational finance, share market analysis, weather prediction, entomology, electro cardiograph, N-body simulations, molecular simulations, etc. As a result, any improvements made in solving CPP and FRNNP will have immediate implications for the solution of numerous problems in these domains. We live in an era of big data and processing these data take large amounts of time. Speeding up data processing algorithms is thus much more essential now than ever before. In this paper we present algorithms for CPP and FRNNP that improve (in theory and/or practice) the best-known algorithms reported in the literature for CPP and FRNNP. These algorithms also improve the best-known algorithms for related applications including time series motif mining and the two locus problem in Genome Wide Association Studies (GWAS)

    Approximation Algorithms for Confidence Bands for Time Series

    Get PDF
    Confidence intervals are a standard technique for analyzing data. When applied to time series, confidence intervals are computed for each time point separately. Alternatively, we can compute confidence bands, where we are required to find the smallest area enveloping k time series, where k is a user parameter. Confidence bands can be then used to detect abnormal time series, not just individual observations within the time series. We will show that despite being an NP-hard problem it is possible to find optimal confidence band for some k. We do this by considering a different problem: discovering regularized bands, where we minimize the envelope area minus the number of included time series weighted by a parameter a. Unlike normal confidence bands we can solve the problem exactly by using a minimum cut. By varying a we can obtain solutions for various k. If we have a constraint k for which we cannot find appropriate a, we demonstrate a simple algorithm that yields O(root n) approximation guarantee by connecting the problem to a minimum k-union problem. This connection also implies that we cannot approximate the problem better than O (n(1/4)) under some (mild) assumptions. Finally, we consider a variant where instead of minimizing the area we minimize the maximum width. Here, we demonstrate a simple 2-approximation algorithm and show that we cannot achieve better approximation guarantee.Peer reviewe

    Approximation Algorithms for Confidence Bands for Time Series

    Get PDF
    Confidence intervals are a standard technique for analyzing data. When applied to time series, confidence intervals are computed for each time point separately. Alternatively, we can compute confidence bands, where we are required to find the smallest area enveloping k time series, where k is a user parameter. Confidence bands can be then used to detect abnormal time series, not just individual observations within the time series. We will show that despite being an NP-hard problem it is possible to find optimal confidence band for some k. We do this by considering a different problem: discovering regularized bands, where we minimize the envelope area minus the number of included time series weighted by a parameter a. Unlike normal confidence bands we can solve the problem exactly by using a minimum cut. By varying a we can obtain solutions for various k. If we have a constraint k for which we cannot find appropriate a, we demonstrate a simple algorithm that yields O(root n) approximation guarantee by connecting the problem to a minimum k-union problem. This connection also implies that we cannot approximate the problem better than O (n(1/4)) under some (mild) assumptions. Finally, we consider a variant where instead of minimizing the area we minimize the maximum width. Here, we demonstrate a simple 2-approximation algorithm and show that we cannot achieve better approximation guarantee.Peer reviewe

    Convex Hull of Points Lying on Lines in o(n log n) Time after Preprocessing

    Full text link
    Motivated by the desire to cope with data imprecision, we study methods for taking advantage of preliminary information about point sets in order to speed up the computation of certain structures associated with them. In particular, we study the following problem: given a set L of n lines in the plane, we wish to preprocess L such that later, upon receiving a set P of n points, each of which lies on a distinct line of L, we can construct the convex hull of P efficiently. We show that in quadratic time and space it is possible to construct a data structure on L that enables us to compute the convex hull of any such point set P in O(n alpha(n) log* n) expected time. If we further assume that the points are "oblivious" with respect to the data structure, the running time improves to O(n alpha(n)). The analysis applies almost verbatim when L is a set of line-segments, and yields similar asymptotic bounds. We present several extensions, including a trade-off between space and query time and an output-sensitive algorithm. We also study the "dual problem" where we show how to efficiently compute the (<= k)-level of n lines in the plane, each of which lies on a distinct point (given in advance). We complement our results by Omega(n log n) lower bounds under the algebraic computation tree model for several related problems, including sorting a set of points (according to, say, their x-order), each of which lies on a given line known in advance. Therefore, the convex hull problem under our setting is easier than sorting, contrary to the "standard" convex hull and sorting problems, in which the two problems require Theta(n log n) steps in the worst case (under the algebraic computation tree model).Comment: 26 pages, 5 figures, 1 appendix; a preliminary version appeared at SoCG 201

    Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems

    Full text link
    We provide a general framework for getting expected linear time constant factor approximations (and in many cases FPTAS's) to several well known problems in Computational Geometry, such as kk-center clustering and farthest nearest neighbor. The new approach is robust to variations in the input problem, and yet it is simple, elegant and practical. In particular, many of these well studied problems which fit easily into our framework, either previously had no linear time approximation algorithm, or required rather involved algorithms and analysis. A short list of the problems we consider include farthest nearest neighbor, kk-center clustering, smallest disk enclosing kk points, kkth largest distance, kkth smallest mm-nearest neighbor distance, kkth heaviest edge in the MST and other spanning forest type problems, problems involving upward closed set systems, and more. Finally, we show how to extend our framework such that the linear running time bound holds with high probability

    NcorpiO\mathcal{O}N : A O(N)\mathcal{O}(N) software for N-body integration in collisional and fragmenting systems

    Full text link
    NcorpiO\mathcal{O}N is a NN-body software developed for the time-efficient integration of collisional and fragmenting systems of planetesimals or moonlets orbiting a central mass. It features a fragmentation model, based on crater scaling and ejecta models, able to realistically simulate a violent impact. The user of NcorpiO\mathcal{O}N can choose between four different built-in modules to compute self-gravity and detect collisions. One of these makes use of a mesh-based algorithm to treat mutual interactions in O(N)\mathcal{O}(N) time. Another module, much more efficient than the standard Barnes-Hut tree code, is a O(N)\mathcal{O}(N) tree-based algorithm called FalcON. It relies on fast multipole expansion for gravity computation and we adapted it to collision detection as well. Computation time is reduced by building the tree structure using a three-dimensional Hilbert curve. For the same precision in mutual gravity computation, NcorpiO\mathcal{O}N is found to be up to 25 times faster than the famous software REBOUND. NcorpiO\mathcal{O}N is written entirely in the C language and only needs a C compiler to run. A python add-on, that requires only basic python libraries, produces animations of the simulations from the output files. The name NcorpiO\mathcal{O}N, reminding of a scorpion, comes from the French NN-corps, meaning NN-body, and from the mathematical notation O(N)\mathcal{O}(N), due to the running time of the software being almost linear in the total number NN of moonlets. NcorpiO\mathcal{O}N is designed for the study of accreting or fragmenting disks of planetesimal or moonlets. It detects collisions and computes mutual gravity faster than REBOUND, and unlike other NN-body integrators, it can resolve a collision by fragmentation. The fast multipole expansions are implemented up to order six to allow for a high precision in mutual gravity computation.Comment: 29 pages, 6 figure

    Polynomial-Sized Topological Approximations Using the Permutahedron

    Get PDF
    corecore