100,517 research outputs found

    Towards a Dynamic Data Structure for Efficient Bounded Line Range Search

    Get PDF
    Abstract We present a data structure for efficient axis-aligned orthogonal range search on a set of n lines in a bounded plane. The algorithm requires O(log n + k) time in the worst case to find all lines intersecting an axis aligned query rectangle R, where k is the number of lines in range. O(n + λ) space is required for the data structure used by the algorithm, where λ is the number of intersection points among the lines. Insertion of a new rightmost line or deletion of a leftmost line requires O(n) time in the worst case. For a sparse arrangement of lines (i.e., for λ = O(n)), insertion of a rightmost line or deletion of a leftmost line requires O( √ n) time, and O(log n + µ) expected time for µ the number of intersection points between and existing lines

    Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

    Full text link
    This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time O(logn)O(\log n) per input symbol (as opposed to amortized O(logn)O(\log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(logn)O(\log n) worst case time per input symbol. Searching for a pattern of length mm in the resulting suffix tree takes O(min(mlogΣ,m+logn)+tocc)O(\min(m\log |\Sigma|, m + \log n) + tocc) time, where tocctocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Faster Clustering via Preprocessing

    Full text link
    We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points MM; the next stage receives as input a query set QMQ\subset M, and should report a clustering of QQ according to some objective, such as 1-median, in which case the answer is a point aMa\in M minimizing qQdM(a,q)\sum_{q\in Q} d_M(a,q). We design fast algorithms that approximately solve such problems under standard clustering objectives like pp-center and pp-median, when the metric MM has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size n=Qn=|Q|, and is (almost) independent of the total number of points m=Mm=|M|.Comment: 24 page

    Exact Distance Oracles for Planar Graphs

    Full text link
    We present new and improved data structures that answer exact node-to-node distance queries in planar graphs. Such data structures are also known as distance oracles. For any directed planar graph on n nodes with non-negative lengths we obtain the following: * Given a desired space allocation S[nlglgn,n2]S\in[n\lg\lg n,n^2], we show how to construct in O~(S)\tilde O(S) time a data structure of size O(S)O(S) that answers distance queries in O~(n/S)\tilde O(n/\sqrt S) time per query. As a consequence, we obtain an improvement over the fastest algorithm for k-many distances in planar graphs whenever k[n,n)k\in[\sqrt n,n). * We provide a linear-space exact distance oracle for planar graphs with query time O(n1/2+eps)O(n^{1/2+eps}) for any constant eps>0. This is the first such data structure with provable sublinear query time. * For edge lengths at least one, we provide an exact distance oracle of space O~(n)\tilde O(n) such that for any pair of nodes at distance D the query time is O~(minD,n)\tilde O(min {D,\sqrt n}). Comparable query performance had been observed experimentally but has never been explained theoretically. Our data structures are based on the following new tool: given a non-self-crossing cycle C with c=O(n)c = O(\sqrt n) nodes, we can preprocess G in O~(n)\tilde O(n) time to produce a data structure of size O(nlglgc)O(n \lg\lg c) that can answer the following queries in O~(c)\tilde O(c) time: for a query node u, output the distance from u to all the nodes of C. This data structure builds on and extends a related data structure of Klein (SODA'05), which reports distances to the boundary of a face, rather than a cycle. The best distance oracles for planar graphs until the current work are due to Cabello (SODA'06), Djidjev (WG'96), and Fakcharoenphol and Rao (FOCS'01). For σ(1,4/3)\sigma\in(1,4/3) and space S=nσS=n^\sigma, we essentially improve the query time from n2/Sn^2/S to n2/S\sqrt{n^2/S}.Comment: To appear in the proceedings of the 23rd ACM-SIAM Symposium on Discrete Algorithms, SODA 201
    corecore