205,378 research outputs found

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    A survey on algorithmic aspects of modular decomposition

    Full text link
    The modular decomposition is a technique that applies but is not restricted to graphs. The notion of module naturally appears in the proofs of many graph theoretical theorems. Computing the modular decomposition tree is an important preprocessing step to solve a large number of combinatorial optimization problems. Since the first polynomial time algorithm in the early 70's, the algorithmic of the modular decomposition has known an important development. This paper survey the ideas and techniques that arose from this line of research

    Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

    Full text link
    In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.Comment: Proceedings of the 21-th ACM/IEEE International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2017). Best Paper Award @DS-RT 201

    A general method for common intervals

    Full text link
    Given an elementary chain of vertex set V, seen as a labelling of V by the set {1, ...,n=|V|}, and another discrete structure over VV, say a graph G, the problem of common intervals is to compute the induced subgraphs G[I], such that II is an interval of [1, n] and G[I] satisfies some property Pi (as for example Pi= "being connected"). This kind of problems comes from comparative genomic in bioinformatics, mainly when the graph GG is a chain or a tree (Heber and Stoye 2001, Heber and Savage 2005, Bergeron et al 2008). When the family of intervals is closed under intersection, we present here the combination of two approaches, namely the idea of potential beginning developed in Uno, Yagiura 2000 and Bui-Xuan et al 2005 and the notion of generator as defined in Bergeron et al 2008. This yields a very simple generic algorithm to compute all common intervals, which gives optimal algorithms in various applications. For example in the case where GG is a tree, our framework yields the first linear time algorithms for the two properties: "being connected" and "being a path". In the case where GG is a chain, the problem is known as: common intervals of two permutations (Uno and Yagiura 2000), our algorithm provides not only the set of all common intervals but also with some easy modifications a tree structure that represents this set

    Active data structures on GPGPUs

    Get PDF
    Active data structures support operations that may affect a large number of elements of an aggregate data structure. They are well suited for extremely fine grain parallel systems, including circuit parallelism. General purpose GPUs were designed to support regular graphics algorithms, but their intermediate level of granularity makes them potentially viable also for active data structures. We consider the characteristics of active data structures and discuss the feasibility of implementing them on GPGPUs. We describe the GPU implementations of two such data structures (ESF arrays and index intervals), assess their performance, and discuss the potential of active data structures as an unconventional programming model that can exploit the capabilities of emerging fine grain architectures such as GPUs

    Lightweight Lempel-Ziv Parsing

    Full text link
    We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.Comment: 12 page

    A Framework for Algorithm Stability

    Get PDF
    We say that an algorithm is stable if small changes in the input result in small changes in the output. This kind of algorithm stability is particularly relevant when analyzing and visualizing time-varying data. Stability in general plays an important role in a wide variety of areas, such as numerical analysis, machine learning, and topology, but is poorly understood in the context of (combinatorial) algorithms. In this paper we present a framework for analyzing the stability of algorithms. We focus in particular on the tradeoff between the stability of an algorithm and the quality of the solution it computes. Our framework allows for three types of stability analysis with increasing degrees of complexity: event stability, topological stability, and Lipschitz stability. We demonstrate the use of our stability framework by applying it to kinetic Euclidean minimum spanning trees
    corecore