260,393 research outputs found

    MUSCLE: a multiple sequence alignment method with reduced time and space complexity

    Get PDF
    BACKGROUND: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. RESULTS: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. CONCLUSIONS: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at

    Exact parallel alignment of megabase genomic sequences with tunable work distribution

    Get PDF
    Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, on daily basis. The exact methods for pairwise alignment have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. To obtain exact results faster, parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space. Also, we propose and evaluate a tunable work distribution mechanism. The results obtained in two clusters show that two sequences of size 24MBP (Mega Base Pairs) and 23MBP, respectively, were successfully aligned with Z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved for 64 processors. The evaluation of our work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen. Finally, when comparing Z-align with BLAST, it is clear that, in many cases, Z-align is able to produce alignments with higher score

    Memory-efficient Multiple Sequence Alignment Menggunakan Dynamic Programing dan Divide-and-Conquer

    Get PDF
    ABSTRAKSI: Multiple sequence alignment merupakan salah satu masalah fundamental pada bidang bioinformatika karena merupakan langkah awal untuk menganalisa phylogenetics tree organisme, memprediksi struktur kedua dan ketiga dari protein dan RNA, dan lain sebagainya. Sejumlah metode dan pendekatan telah dipublikasikan selama lebih dari 30 tahun terakhir. Namun belum ada satu tool pun yang dapat secara mangkus menyelesaikan masalah multiple sequence alignment.Metode dynamic programming telah terbukti dapat menangani masalah pairwise sequence alignement secara efektif dan efisien baik pada global, maupun local alignment. Namun, ketika dikembangkan untuk menangani multiple sequence alignment, metode dynamic programming membutuhkan resource yang sangat besar.Untuk itu, pada tugas akhir ini, digunakan metode divide-and-conquer untuk mengefisiensikan memory yang digunakan dynamic programming untuk melakukan multiple sequence alignment.Dengan menerapkan metode divide-and-conquer, kompleksitas ruang untuk melakukan multiple sequence alignment dapat berkurang dari Ο(nmk), dimana nm merupakan panjang maksimum sequence awal dan k merupakan banyaknya sequence yang di-align, menjadi Ο(ńmk), dimana ńm merupakan batas panjang maksimum sequence yang diperbolehkan. Namun, akibat penggunaan divide-and-conquer, hasil alignment menjadi tidak optimal (approximate). Untuk memperbaiki hasil alignment agar kembali optimal, digunakan iterative refinement. Kompleksitas ruangnya kemudian menjadi Ο(Lk), dimana L merupakan limit yang digunakan.Kata Kunci : Bioinformatika, multiple sequence alignment, global sequence alignment, Efisiensi memory, optimasi, exact method, dynamic programming, divide-and-conquer, iterative refinement.ABSTRACT: Multiple sequence alignment is the most fundamental problem in bioinformatics research field because it is the first step to analyze organism phylogenetic tree, secondary and tertiary structure prediction of protein and RNA, etc. various of methods and approachs have been published for over the last 30 years. But there is no method that can solve the problem of multiple sequence alignment efficiently and optimally.Dynamic programming method has been proven to handle pairwise sequence alignment effective and efficiently at both global and local alignment. However, when developed to handle multiple sequences alignment, it often to fail because the requirement of very large resource.Therefore, in this final, the divide-and-conquer method adapted to make the memory used by dynamic programming to solve multiple sequence alignment problems more efficient.By applying divide-and-conquer method, the time and space complexity to perform multiple sequence alignment can be reduced from Ο(nmk), where nm is the maximum length of initial sequence and k is the number of sequences that want to be aligned, to Ο(ńmk), where ńm is the limit of allowed sequence length. However, due to the use of divide-and-conquer, the resulting alignment becomes unoptimal (approximate). To improve the resulting alignment back to optimum, iterative refinement adapted. The space complexity than become Ο(nmk), where L is the limit used.Keyword: Bioinformatics, multiple sequence alignment, global sequence alignment, memory efficiency, optimization, exact method, dynamic programming, divide-and-conquer, iterative refinement

    A Search for Energy Minimized Sequences of Proteins

    Get PDF
    In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Planar PØP: feature-less pose estimation with applications in UAV localization

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a featureless pose estimation method that, in contrast to current Perspective-n-Point (PnP) approaches, it does not require n point correspondences to obtain the camera pose, allowing for pose estimation from natural shapes that do not necessarily have distinguished features like corners or intersecting edges. Instead of using n correspondences (e.g. extracted with a feature detector) we will use the raw polygonal representation of the observed shape and directly estimate the pose in the pose-space of the camera. This method compared with a general PnP method, does not require n point correspondences neither a priori knowledge of the object model (except the scale), which is registered with a picture taken from a known robot pose. Moreover, we achieve higher precision because all the information of the shape contour is used to minimize the area between the projected and the observed shape contours. To emphasize the non-use of n point correspondences between the projected template and observed contour shape, we call the method Planar PØP. The method is shown both in simulation and in a real application consisting on a UAV localization where comparisons with a precise ground-truth are provided.Peer ReviewedPostprint (author's final draft

    Extension-twist coupled laminates for aero-elastic compliant blade design

    Get PDF
    A definite list of laminate configurations with extension-twisting (and shearing-bending) coupling is derived for up to 21 plies of identical thickness. The list comprises individual stacking sequences, containing standard angle-ply and cross-ply sub-sequences; combinations which are contrary to the previously assumed form for this class of laminate. The list also contains dimensionless parameters from which the extensional, coupling and bending stiffness terms are readily calculated for any fiber/matrix system. Lamination parameters are shown graphically to illustrate the extent of the design space with up to 21 plies. A special sub-group from this class of coupled laminate is identified that can be manufactured flat under a standard elevated temperature curing process; this sub-group possesses hygro-thermally curvature-stable behavior. Finally, bounds on the compression buckling strength are assessed using a closed form solution for all the laminate groups presented
    corecore