2,446 research outputs found

    Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

    Full text link
    Depth first search (DFS) tree is a fundamental data structure for solving graph problems. The classical algorithm [SiComp74] for building a DFS tree requires O(m+n)O(m+n) time for a given graph GG having nn vertices and mm edges. Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS tree of an undirected graph after an edge/vertex update in O~(n)\tilde{O}(n) time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds, that can be adopted easily to the parallel environment. In the parallel model, a DFS tree can be computed from scratch using mm processors in expected O~(1)\tilde{O}(1) time [SiComp90] on an EREW PRAM, whereas the best deterministic algorithm takes O~(n)\tilde{O}(\sqrt{n}) time [SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal (upto polylog n factors deterministic algorithms for maintaining fully dynamic DFS and fault tolerant DFS, of an undirected graph. 1- Parallel Fully Dynamic DFS: Given an arbitrary online sequence of vertex/edge updates, we can maintain a DFS tree of an undirected graph in O~(1)\tilde{O}(1) time per update using mm processors on an EREW PRAM. 2- Parallel Fault tolerant DFS: An undirected graph can be preprocessed to build a data structure of size O(m) such that for a set of kk updates (where kk is constant) in the graph, the updated DFS tree can be computed in O~(1)\tilde{O}(1) time using nn processors on an EREW PRAM. Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree in semi-streaming model and a restricted distributed model. These are the first parallel, semi-streaming and distributed algorithms for maintaining a DFS tree in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    Time Versus Cost Tradeoffs for Deterministic Rendezvous in Networks

    Full text link
    Two mobile agents, starting from different nodes of a network at possibly different times, have to meet at the same node. This problem is known as rendezvous\mathit{rendezvous}. Agents move in synchronous rounds. Each agent has a distinct integer label from the set {1,,L}\{1,\dots,L\}. Two main efficiency measures of rendezvous are its time\mathit{time} (the number of rounds until the meeting) and its cost\mathit{cost} (the total number of edge traversals). We investigate tradeoffs between these two measures. A natural benchmark for both time and cost of rendezvous in a network is the number of edge traversals needed for visiting all nodes of the network, called the exploration time. Hence we express the time and cost of rendezvous as functions of an upper bound EE on the time of exploration (where EE and a corresponding exploration procedure are known to both agents) and of the size LL of the label space. We present two natural rendezvous algorithms. Algorithm Cheap\mathtt{Cheap} has cost O(E)O(E) (and, in fact, a version of this algorithm for the model where the agents start simultaneously has cost exactly EE) and time O(EL)O(EL). Algorithm Fast\mathtt{Fast} has both time and cost O(ElogL)O(E\log L). Our main contributions are lower bounds showing that, perhaps surprisingly, these two algorithms capture the tradeoffs between time and cost of rendezvous almost tightly. We show that any deterministic rendezvous algorithm of cost asymptotically EE (i.e., of cost E+o(E)E+o(E)) must have time Ω(EL)\Omega(EL). On the other hand, we show that any deterministic rendezvous algorithm with time complexity O(ElogL)O(E\log L) must have cost Ω(ElogL)\Omega (E\log L)

    Hybrid model for vascular tree structures

    Get PDF
    This paper proposes a new representation scheme of the cerebral blood vessels. This model provides information on the semantics of the vascular structure: the topological relationships between vessels and the labeling of vascular accidents such as aneurysms and stenoses. In addition, the model keeps information of the inner surface geometry as well as of the vascular map volume properties, i.e. the tissue density, the blood flow velocity and the vessel wall elasticity. The model can be constructed automatically in a pre-process from a set of segmented MRA images. Its memory requirements are optimized on the basis of the sparseness of the vascular structure. It allows fast queries and efficient traversals and navigations. The visualizations of the vessel surface can be performed at different levels of detail. The direct rendering of the volume is fast because the model provides a natural way to skip over empty data. The paper analyzes the memory requirements of the model along with the costs of the most important operations on it.Postprint (published version

    Geometry-Oblivious FMM for Compressing Dense SPD Matrices

    Full text link
    We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in NlogNN \log N or even NN time, where NN is the matrix size. Compression requires NlogNN \log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1

    Scaling Limits for Minimal and Random Spanning Trees in Two Dimensions

    Full text link
    A general formulation is presented for continuum scaling limits of stochastic spanning trees. A spanning tree is expressed in this limit through a consistent collection of subtrees, which includes a tree for every finite set of endpoints in Rd\R^d. Tightness of the distribution, as δ0\delta \to 0, is established for the following two-dimensional examples: the uniformly random spanning tree on δZ2\delta \Z^2, the minimal spanning tree on δZ2\delta \Z^2 (with random edge lengths), and the Euclidean minimal spanning tree on a Poisson process of points in R2\R^2 with density δ2\delta^{-2}. In each case, sample trees are proven to have the following properties, with probability one with respect to any of the limiting measures: i) there is a single route to infinity (as was known for δ>0\delta > 0), ii) the tree branches are given by curves which are regular in the sense of H\"older continuity, iii) the branches are also rough, in the sense that their Hausdorff dimension exceeds one, iv) there is a random dense subset of R2\R^2, of dimension strictly between one and two, on the complement of which (and only there) the spanning subtrees are unique with continuous dependence on the endpoints, v) branching occurs at countably many points in R2\R^2, and vi) the branching numbers are uniformly bounded. The results include tightness for the loop erased random walk (LERW) in two dimensions. The proofs proceed through the derivation of scale-invariant power bounds on the probabilities of repeated crossings of annuli.Comment: Revised; 54 pages, 6 figures (LaTex
    corecore