235,965 research outputs found

    SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications

    Full text link
    Summary: The Smith Waterman (SW) algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools, but current implementations are either designed as monolithic protein database searching tools or are embedded into other tools. To facilitate easy integration of the fast Single Instruction Multiple Data (SIMD) SW algorithm into third party software, we wrote a C/C++ library, which extends Farrars Striped SW (SSW) to return alignment information in addition to the optimal SW score. Availability: SSW is available both as a C/C++ software library, as well as a stand alone alignment tool wrapping the librarys functionality at https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library Contact: [email protected]: 3 pages, 2 figure

    Why not Merge the International Monetary Fund (IMF) with the International Bank for Reconstruction and Development (World Bank)

    Get PDF
    Motivation: Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results: Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation-maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost. Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structuralmodels frommacromolecules captured by CECT

    Parametric Alignment of Drosophila Genomes

    Get PDF
    The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure

    Covariance alignment: from maximum likelihood estimation to Gromov-Wasserstein

    Full text link
    Feature alignment methods are used in many scientific disciplines for data pooling, annotation, and comparison. As an instance of a permutation learning problem, feature alignment presents significant statistical and computational challenges. In this work, we propose the covariance alignment model to study and compare various alignment methods and establish a minimax lower bound for covariance alignment that has a non-standard dimension scaling because of the presence of a nuisance parameter. This lower bound is in fact minimax optimal and is achieved by a natural quasi MLE. However, this estimator involves a search over all permutations which is computationally infeasible even when the problem has moderate size. To overcome this limitation, we show that the celebrated Gromov-Wasserstein algorithm from optimal transport which is more amenable to fast implementation even on large-scale problems is also minimax optimal. These results give the first statistical justification for the deployment of the Gromov-Wasserstein algorithm in practice.Comment: 41 pages, 2 figure

    An FGO-based Unified Initial Alignment Method of Strapdown Inertial Navigation System

    Full text link
    The initial alignment process can provide an accurate initial attitude of strapdown inertial navigation system. The conventional two-procedure method usually includes coarse and fine alignment processes. Coarse alignment converges fast because of its batch estimating characteristics and the initial attitude does not influence the results. But coarse alignment is low accuracy without considering the IMU's bias. The fine alignment is more accurate by applying a recursive Bayesian filter to estimate the IMU's bias, but the attitude converges slowly as the initial value influence the convergence speed of the recursive filter. Researchers have proposed the unified initial alignment to achieve initial alignment in one procedure, existing unified methods make improvements on the basics of recursive Bayesian filter and those methods are still slow to converge. In this paper, a unified method based on batch estimator FGO (factor graph optimization) is raised, which is converge fast like coarse alignment and accurate than the existing method. We redefine the state and rederivation the state dynamic model first. Then, the optimal attitude and the IMU's bias are estimated simultaneously through FGO. The fast convergence and high accuracy of this method are verified by simulation and physical experiments on a rotation SINS.Comment: 9 pages, Journal Paper

    Accelerated Profile HMM Searches

    Get PDF
    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches
    corecore