235,965 research outputs found
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications
Summary: The Smith Waterman (SW) algorithm, which produces the optimal
pairwise alignment between two sequences, is frequently used as a key component
of fast heuristic read mapping and variation detection tools, but current
implementations are either designed as monolithic protein database searching
tools or are embedded into other tools. To facilitate easy integration of the
fast Single Instruction Multiple Data (SIMD) SW algorithm into third party
software, we wrote a C/C++ library, which extends Farrars Striped SW (SSW) to
return alignment information in addition to the optimal SW score. Availability:
SSW is available both as a C/C++ software library, as well as a stand alone
alignment tool wrapping the librarys functionality at
https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library Contact:
[email protected]: 3 pages, 2 figure
Why not Merge the International Monetary Fund (IMF) with the International Bank for Reconstruction and Development (World Bank)
Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost. Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structuralmodels frommacromolecules captured by CECT
Parametric Alignment of Drosophila Genomes
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a
maximum a posteriori probability alignment for a pair hidden Markov model
(PHMM). In order to process large genomes that have undergone complex genome
rearrangements, almost all existing whole genome alignment methods apply fast
heuristics to divide genomes into small pieces which are suitable for
Needleman--Wunsch alignment. In these alignment methods, it is standard
practice to fix the parameters and to produce a single alignment for subsequent
analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of
Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment
resolves the issue of robustness to changes in parameters by finding all
optimal alignments for all possible parameters in a PHMM. Our alignment draws
on existing heuristics for dividing whole genomes into small pieces for
alignment, and it relies on advances we have made in computing convex polytopes
that allow us to parametrically align non-coding regions using biologically
realistic models. We demonstrate the utility of our parametric alignment for
biological inference by showing that cis-regulatory elements are more conserved
between Drosophila melanogaster and Drosophila pseudoobscura than previously
thought. We also show how whole genome parametric alignment can be used to
quantitatively assess the dependence of branch length estimates on alignment
parameters.
The alignment polytopes, software, and supplementary material can be
downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure
Covariance alignment: from maximum likelihood estimation to Gromov-Wasserstein
Feature alignment methods are used in many scientific disciplines for data
pooling, annotation, and comparison. As an instance of a permutation learning
problem, feature alignment presents significant statistical and computational
challenges. In this work, we propose the covariance alignment model to study
and compare various alignment methods and establish a minimax lower bound for
covariance alignment that has a non-standard dimension scaling because of the
presence of a nuisance parameter. This lower bound is in fact minimax optimal
and is achieved by a natural quasi MLE. However, this estimator involves a
search over all permutations which is computationally infeasible even when the
problem has moderate size. To overcome this limitation, we show that the
celebrated Gromov-Wasserstein algorithm from optimal transport which is more
amenable to fast implementation even on large-scale problems is also minimax
optimal. These results give the first statistical justification for the
deployment of the Gromov-Wasserstein algorithm in practice.Comment: 41 pages, 2 figure
An FGO-based Unified Initial Alignment Method of Strapdown Inertial Navigation System
The initial alignment process can provide an accurate initial attitude of
strapdown inertial navigation system. The conventional two-procedure method
usually includes coarse and fine alignment processes. Coarse alignment
converges fast because of its batch estimating characteristics and the initial
attitude does not influence the results. But coarse alignment is low accuracy
without considering the IMU's bias. The fine alignment is more accurate by
applying a recursive Bayesian filter to estimate the IMU's bias, but the
attitude converges slowly as the initial value influence the convergence speed
of the recursive filter. Researchers have proposed the unified initial
alignment to achieve initial alignment in one procedure, existing unified
methods make improvements on the basics of recursive Bayesian filter and those
methods are still slow to converge. In this paper, a unified method based on
batch estimator FGO (factor graph optimization) is raised, which is converge
fast like coarse alignment and accurate than the existing method. We redefine
the state and rederivation the state dynamic model first. Then, the optimal
attitude and the IMU's bias are estimated simultaneously through FGO. The fast
convergence and high accuracy of this method are verified by simulation and
physical experiments on a rotation SINS.Comment: 9 pages, Journal Paper
Accelerated Profile HMM Searches
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches
- …