38,857 research outputs found
Computing alignment plots efficiently
Dot plots are a standard method for local comparison of biological sequences.
In a dot plot, a substring to substring distance is computed for all pairs of
fixed-size windows in the input strings. Commonly, the Hamming distance is used
since it can be computed in linear time. However, the Hamming distance is a
rather crude measure of string similarity, and using an alignment-based edit
distance can greatly improve the sensitivity of the dot plot method. In this
paper, we show how to compute alignment plots of the latter type efficiently.
Given two strings of length m and n and a window size w, this problem consists
in computing the edit distance between all pairs of substrings of length w, one
from each input string. The problem can be solved by repeated application of
the standard dynamic programming algorithm in time O(mnw^2). This paper gives
an improved data-parallel algorithm, running in time using
vector operations that work on values in parallel and processors.
We show experimental results from an implementation of this algorithm, which
uses Intel's MMX/SSE instructions for vector parallelism and MPI for
coarse-grained parallelism.Comment: Presented at ParCo 200
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
Experimental demonstration of a directionally-unbiased linear-optical multiport
All existing optical quantum walk approaches are based on the use of
beamsplitters and multiple paths to explore the multitude of unitary
transformations of quantum amplitudes in a Hilbert space. The beamsplitter is
naturally a directionally biased device: the photon cannot travel in reverse
direction. This causes rapid increases in optical hardware resources required
for complex quantum walk applications, since the number of options for the
walking particle grows with each step. Here we present the experimental
demonstration of a directionally-unbiased linear-optical multiport, which
allows reversibility of photon direction. An amplitude-controllable probability
distribution matrix for a unitary three-edge vertex is reconstructed with only
linear-optical devices. Such directionally-unbiased multiports allow direct
execution of quantum walks over a multitude of complex graphs and in tensor
networks. This approach would enable simulation of complex Hamiltonians of
physical systems and quantum walk applications in a more efficient and compact
setup, substantially reducing the required hardware resources
Jabba: hybrid error correction for long sequencing reads
Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.
Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.
Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph
- …