Search CORE

100,517 research outputs found

Towards a Dynamic Data Structure for Efficient Bounded Line Range Search

Author: Bradford G Nickerson
Thu Le
Thuy Thi
Publication venue
Publication date: 03/04/2020
Field of study

Abstract We present a data structure for efficient axis-aligned orthogonal range search on a set of n lines in a bounded plane. The algorithm requires O(log n + k) time in the worst case to find all lines intersecting an axis aligned query rectangle R, where k is the number of lines in range. O(n + λ) space is required for the data structure used by the algorithm, where λ is the number of intersection points among the lines. Insertion of a new rightmost line or deletion of a leftmost line requires O(n) time in the worst case. For a sparse arrangement of lines (i.e., for λ = O(n)), insertion of a rightmost line or deletion of a leftmost line requires O( √ n) time, and O(log n + µ) expected time for µ the number of intersection points between and existing lines

CiteSeerX

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

Faster Clustering via Preprocessing

Author: Kopelowitz Tsvi
Krauthgamer Robert
Publication venue
Publication date: 01/01/2012
Field of study

We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points

M

; the next stage receives as input a query set

Q\subset M

, and should report a clustering of

Q

according to some objective, such as 1-median, in which case the answer is a point

a\in M

minimizing

\sum_{q\in Q} d_M(a,q)

. We design fast algorithms that approximately solve such problems under standard clustering objectives like

p

-center and

p

-median, when the metric

M

has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size

n=|Q|

, and is (almost) independent of the total number of points

m=|M|

.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Exact Distance Oracles for Planar Graphs

Author: Mozes Shay
Sommer Christian
Publication venue
Publication date: 01/01/2010
Field of study

We present new and improved data structures that answer exact node-to-node distance queries in planar graphs. Such data structures are also known as distance oracles. For any directed planar graph on n nodes with non-negative lengths we obtain the following: * Given a desired space allocation

S\in[n\lg\lg n,n^2]

, we show how to construct in

\tilde O(S)

time a data structure of size

O(S)

that answers distance queries in

\tilde O(n/\sqrt S)

time per query. As a consequence, we obtain an improvement over the fastest algorithm for k-many distances in planar graphs whenever

k\in[\sqrt n,n)

. * We provide a linear-space exact distance oracle for planar graphs with query time

O(n^{1/2+eps})

for any constant eps>0. This is the first such data structure with provable sublinear query time. * For edge lengths at least one, we provide an exact distance oracle of space

\tilde O(n)

such that for any pair of nodes at distance D the query time is

\tilde O(min {D,\sqrt n})

. Comparable query performance had been observed experimentally but has never been explained theoretically. Our data structures are based on the following new tool: given a non-self-crossing cycle C with

c = O(\sqrt n)

nodes, we can preprocess G in

\tilde O(n)

time to produce a data structure of size

O(n \lg\lg c)

that can answer the following queries in

\tilde O(c)

time: for a query node u, output the distance from u to all the nodes of C. This data structure builds on and extends a related data structure of Klein (SODA'05), which reports distances to the boundary of a face, rather than a cycle. The best distance oracles for planar graphs until the current work are due to Cabello (SODA'06), Djidjev (WG'96), and Fakcharoenphol and Rao (FOCS'01). For

\sigma\in(1,4/3)

and space

S=n^\sigma

, we essentially improve the query time from

n^2/S

\sqrt{n^2/S}

.Comment: To appear in the proceedings of the 23rd ACM-SIAM Symposium on Discrete Algorithms, SODA 201

arXiv.org e-Print Archive

CiteSeerX