Search CORE

260,393 research outputs found

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Author: Edgar Robert C
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. RESULTS: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. CONCLUSIONS: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exact parallel alignment of megabase genomic sequences with tunable work distribution

Author: Batista Rodolfo Bezerra
Boukerche Azzedine
Melo Alba Cristina Magalhães Alves de
Scarel Felipe Brandt
Souza Lavir Antonio Bahia Carvalho de
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2012
Field of study

Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, on daily basis. The exact methods for pairwise alignment have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. To obtain exact results faster, parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space. Also, we propose and evaluate a tunable work distribution mechanism. The results obtained in two clusters show that two sequences of size 24MBP (Mega Base Pairs) and 23MBP, respectively, were successfully aligned with Z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved for 64 processors. The evaluation of our work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen. Finally, when comparing Z-align with BLAST, it is clear that, in many cases, Z-align is able to produce alignments with higher score

Repositório Institucional da Universidade de Brasília

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Memory-efficient Multiple Sequence Alignment Menggunakan Dynamic Programing dan Divide-and-Conquer

Author: Ferri Renaldo
Publication venue: Universitas Telkom
Publication date: 01/01/2011
Field of study

ABSTRAKSI: Multiple sequence alignment merupakan salah satu masalah fundamental pada bidang bioinformatika karena merupakan langkah awal untuk menganalisa phylogenetics tree organisme, memprediksi struktur kedua dan ketiga dari protein dan RNA, dan lain sebagainya. Sejumlah metode dan pendekatan telah dipublikasikan selama lebih dari 30 tahun terakhir. Namun belum ada satu tool pun yang dapat secara mangkus menyelesaikan masalah multiple sequence alignment.Metode dynamic programming telah terbukti dapat menangani masalah pairwise sequence alignement secara efektif dan efisien baik pada global, maupun local alignment. Namun, ketika dikembangkan untuk menangani multiple sequence alignment, metode dynamic programming membutuhkan resource yang sangat besar.Untuk itu, pada tugas akhir ini, digunakan metode divide-and-conquer untuk mengefisiensikan memory yang digunakan dynamic programming untuk melakukan multiple sequence alignment.Dengan menerapkan metode divide-and-conquer, kompleksitas ruang untuk melakukan multiple sequence alignment dapat berkurang dari Ο(nmk), dimana nm merupakan panjang maksimum sequence awal dan k merupakan banyaknya sequence yang di-align, menjadi Ο(ńmk), dimana ńm merupakan batas panjang maksimum sequence yang diperbolehkan. Namun, akibat penggunaan divide-and-conquer, hasil alignment menjadi tidak optimal (approximate). Untuk memperbaiki hasil alignment agar kembali optimal, digunakan iterative refinement. Kompleksitas ruangnya kemudian menjadi Ο(Lk), dimana L merupakan limit yang digunakan.Kata Kunci : Bioinformatika, multiple sequence alignment, global sequence alignment, Efisiensi memory, optimasi, exact method, dynamic programming, divide-and-conquer, iterative refinement.ABSTRACT: Multiple sequence alignment is the most fundamental problem in bioinformatics research field because it is the first step to analyze organism phylogenetic tree, secondary and tertiary structure prediction of protein and RNA, etc. various of methods and approachs have been published for over the last 30 years. But there is no method that can solve the problem of multiple sequence alignment efficiently and optimally.Dynamic programming method has been proven to handle pairwise sequence alignment effective and efficiently at both global and local alignment. However, when developed to handle multiple sequences alignment, it often to fail because the requirement of very large resource.Therefore, in this final, the divide-and-conquer method adapted to make the memory used by dynamic programming to solve multiple sequence alignment problems more efficient.By applying divide-and-conquer method, the time and space complexity to perform multiple sequence alignment can be reduced from Ο(nmk), where nm is the maximum length of initial sequence and k is the number of sequences that want to be aligned, to Ο(ńmk), where ńm is the limit of allowed sequence length. However, due to the use of divide-and-conquer, the resulting alignment becomes unoptimal (approximate). To improve the resulting alignment back to optimum, iterative refinement adapted. The space complexity than become Ο(nmk), where L is the limit used.Keyword: Bioinformatics, multiple sequence alignment, global sequence alignment, memory efficiency, optimization, exact method, dynamic programming, divide-and-conquer, iterative refinement

Open Library

A Search for Energy Minimized Sequences of Proteins

Author: A Kolinski
A Luthra
AG Street
AN Jha
Anupam Nath Jha
B Gillespie
B Kuhlman
B Kuhlman
Banahalli Ratna
BI Dahiyat
BI Dahiyat
C Lee
CA Floudas
CB Anfinsen
DA Hinds
E Farinas
G Dantas
G. K. Ananthasuresh
GL Butterfoss
H Kono
HK Fung
HM Berman
HW Hellinga
J Desmet
JD Bloom
JG Saven
JG Saven
JG Saven
JL Klepeis
JR Desjarlais
K Svanberg
K Yue
KV Brinda
LL Looger
MS Venkatarajan
N Pokala
P Koehl
P Koehl
P Koehl
P Koehl
P Koehl
S Miyazawa
S Rakshit
Saraswathi Vishveshwara
SF Altschul
SK Koh
SK Koh
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function

CiteSeerX

Public Library of Science (PLOS)

Elsevier - Publisher Connector

Crossref

Directory of Open Access Journals

PubMed Central

Open Access Repository of IISc Research Publications

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

PubMed Central

Planar PØP: feature-less pose estimation with applications in UAV localization

Author: Amor Martínez Adrián
Herrero Cotarelo Fernando
Ruiz Alberto
Sanfeliu Cortés Alberto
Santamaria Navarro Àngel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a featureless pose estimation method that, in contrast to current Perspective-n-Point (PnP) approaches, it does not require n point correspondences to obtain the camera pose, allowing for pose estimation from natural shapes that do not necessarily have distinguished features like corners or intersecting edges. Instead of using n correspondences (e.g. extracted with a feature detector) we will use the raw polygonal representation of the observed shape and directly estimate the pose in the pose-space of the camera. This method compared with a general PnP method, does not require n point correspondences neither a priori knowledge of the object model (except the scale), which is registered with a picture taken from a known robot pose. Moreover, we achieve higher precision because all the information of the shape contour is used to minimize the area between the projected and the observed shape contours. To emphasize the non-use of n point correspondences between the projected template and observed contour shape, we call the method Planar PØP. The method is shown both in simulation and in a real application consisting on a UAV localization where comparisons with a precise ground-truth are provided.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Extension-twist coupled laminates for aero-elastic compliant blade design

Author: York C.B.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2012
Field of study

A definite list of laminate configurations with extension-twisting (and shearing-bending) coupling is derived for up to 21 plies of identical thickness. The list comprises individual stacking sequences, containing standard angle-ply and cross-ply sub-sequences; combinations which are contrary to the previously assumed form for this class of laminate. The list also contains dimensionless parameters from which the extensional, coupling and bending stiffness terms are readily calculated for any fiber/matrix system. Lamination parameters are shown graphically to illustrate the extent of the design space with up to 21 plies. A special sub-group from this class of coupled laminate is identified that can be manufactured flat under a standard elevated temperature curing process; this sub-group possesses hygro-thermally curvature-stable behavior. Finally, bounds on the compression buckling strength are assessed using a closed form solution for all the laminate groups presented

Enlighten