205,378 research outputs found
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
A survey on algorithmic aspects of modular decomposition
The modular decomposition is a technique that applies but is not restricted
to graphs. The notion of module naturally appears in the proofs of many graph
theoretical theorems. Computing the modular decomposition tree is an important
preprocessing step to solve a large number of combinatorial optimization
problems. Since the first polynomial time algorithm in the early 70's, the
algorithmic of the modular decomposition has known an important development.
This paper survey the ideas and techniques that arose from this line of
research
Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors
In this paper we consider the problem of identifying intersections between
two sets of d-dimensional axis-parallel rectangles. This is a common problem
that arises in many agent-based simulation studies, and is of central
importance in the context of High Level Architecture (HLA), where it is at the
core of the Data Distribution Management (DDM) service. Several realizations of
the DDM service have been proposed; however, many of them are either
inefficient or inherently sequential. These are serious limitations since
multicore processors are now ubiquitous, and DDM algorithms -- being
CPU-intensive -- could benefit from additional computing power. We propose a
parallel version of the Sort-Based Matching algorithm for shared-memory
multiprocessors. Sort-Based Matching is one of the most efficient serial
algorithms for the DDM problem, but is quite difficult to parallelize due to
data dependencies. We describe the algorithm and compute its asymptotic running
time; we complete the analysis by assessing its performance and scalability
through extensive experiments on two commodity multicore systems based on a
dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.Comment: Proceedings of the 21-th ACM/IEEE International Symposium on
Distributed Simulation and Real Time Applications (DS-RT 2017). Best Paper
Award @DS-RT 201
A general method for common intervals
Given an elementary chain of vertex set V, seen as a labelling of V by the
set {1, ...,n=|V|}, and another discrete structure over , say a graph G, the
problem of common intervals is to compute the induced subgraphs G[I], such that
is an interval of [1, n] and G[I] satisfies some property Pi (as for
example Pi= "being connected"). This kind of problems comes from comparative
genomic in bioinformatics, mainly when the graph is a chain or a tree
(Heber and Stoye 2001, Heber and Savage 2005, Bergeron et al 2008).
When the family of intervals is closed under intersection, we present here
the combination of two approaches, namely the idea of potential beginning
developed in Uno, Yagiura 2000 and Bui-Xuan et al 2005 and the notion of
generator as defined in Bergeron et al 2008. This yields a very simple generic
algorithm to compute all common intervals, which gives optimal algorithms in
various applications. For example in the case where is a tree, our
framework yields the first linear time algorithms for the two properties:
"being connected" and "being a path". In the case where is a chain, the
problem is known as: common intervals of two permutations (Uno and Yagiura
2000), our algorithm provides not only the set of all common intervals but also
with some easy modifications a tree structure that represents this set
Active data structures on GPGPUs
Active data structures support operations that may affect a large number of elements of an aggregate data structure. They are well suited for extremely fine grain parallel systems, including circuit parallelism. General purpose GPUs were designed to support regular graphics algorithms, but their intermediate level of granularity makes them potentially viable also for active data structures. We consider the characteristics of active data structures and discuss the feasibility of implementing them on GPGPUs. We describe the GPU implementations of two such data structures (ESF arrays and index intervals), assess their performance, and discuss the potential of active data structures as an unconventional programming model that can exploit the capabilities of emerging fine grain architectures such as GPUs
Lightweight Lempel-Ziv Parsing
We introduce a new approach to LZ77 factorization that uses O(n/d) words of
working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet
sizes). We also describe carefully engineered implementations of alternative
approaches to lightweight LZ77 factorization. Extensive experiments show that
the new algorithm is superior in most cases, particularly at the lowest memory
levels and for highly repetitive data. As a part of the algorithm, we describe
new methods for computing matching statistics which may be of independent
interest.Comment: 12 page
A Framework for Algorithm Stability
We say that an algorithm is stable if small changes in the input result in
small changes in the output. This kind of algorithm stability is particularly
relevant when analyzing and visualizing time-varying data. Stability in general
plays an important role in a wide variety of areas, such as numerical analysis,
machine learning, and topology, but is poorly understood in the context of
(combinatorial) algorithms. In this paper we present a framework for analyzing
the stability of algorithms. We focus in particular on the tradeoff between the
stability of an algorithm and the quality of the solution it computes. Our
framework allows for three types of stability analysis with increasing degrees
of complexity: event stability, topological stability, and Lipschitz stability.
We demonstrate the use of our stability framework by applying it to kinetic
Euclidean minimum spanning trees
- …