Search CORE

3,009 research outputs found

Determining a substitution matrix for the alignment of disordered proteins

Author: Kim Dong
Publication venue: RIT Scholar Works
Publication date: 24/04/2013
Field of study

As the research of disordered proteins progresses and more disordered protein sequences are discovered, an optimal substitution matrix for the alignment of these sequences must be elucidated. The currently used substitution matrices, PAM and BLOSUM, are ideal for the alignment of general protein sequences. But it is discovered that this set of matrices is not adequate for the specific alignment of disordered protein sequences. By implementing genetic algorithms, a substitution matrix improved for the alignment of disordered proteins has been achieved. The genetic algorithm determined matrix performed two times better when compared to BLOSUM62 and PAM250

RIT Scholar Works

Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility

Author: Go Mitiko
Hijikata Atsushi
Noguti Tosiyuki
Yura Kei
Publication venue: Wiley Subscription Services, Inc., A Wiley Company
Publication date
Field of study

In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the “twilight zone” between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at http://cib.cf.ocha.ac.jp/target_protein/. Proteins 2011; © 2011 Wiley-Liss, Inc

Crossref

PubMed Central

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Author: Edgar Robert C
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. RESULTS: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. CONCLUSIONS: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Optimal Sequence Alignment and Its Relationship with Phylogeny

Author: Atoosa Ghahremani
Mahmood A. Mahdavi
Publication venue: 'IntechOpen'
Publication date: 02/11/2011
Field of study

IntechOpen

A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies

Author: John Quackenbush
Patrick D. Schloss
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations

Author: Abascal
Bininda-Emonds
Castresana
Clamp
Edgar
Federico Abascal
Gilbert
Guindon
Katoh
Loytynoja
Maximilian J. Telford
Moretti
Notredame
Panico
Posada
Rafael Zardoya
Rice
Schuerer
Simmons
Simmons
Suyama
Talavera
Thompson
Townsend
Wernersson
Yang
Publication venue: Oxford University Press
Publication date: 01/05/2010
Field of study

We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk

Crossref

PubMed Central

UCL Discovery

Protein Structure Prediction

Author: Tuček Jaroslav
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2009
Field of study

Práce popisuje prostorovou strukturu molekul bílkovin a databází uchovávajících representace této struktury, či její hierarchické klasifikace. Je poskytnut přehled současných metod výpočetní predikce struktury bílkovin, přičemž největší pozornost je soustředěna na komparativní modelování. Tato metoda je rovněž v základní podobě implementována a na závěr její implementace analyzována.This work describes the three dimensional structure of protein molecules and biological databases used to store information about this structure or its hierarchical classification. Current methods of computational structure prediction are overviewed with an emphasis on comparative modeling. This particular method is also implemented in a proof-of-concept program and finally, the implementation is analysed.

Digital library of Brno University of Technology

National Repository of Grey Literature

Aligning Sequences by Minimum Description Length

Author: Conery JohnS
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

<p/> <p>This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from <inline-formula><graphic file="1687-4153-2007-72936-i1.gif"/></inline-formula>. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Structure alignment based on coding of local geometric measures

Author: Chang Peter L
Dewey T Gregory
Rinne Andrew W
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A structure alignment method based on a local geometric property is presented and its performance is tested in pairwise and multiple structure alignments. In this approach, the writhing number, a quantity originating from integral formulas of Vassiliev knot invariants, is used as a local geometric measure. This measure is used in a sliding window to calculate the local writhe down the length of the protein chain. By encoding the distribution of writhing numbers across all the structures in the protein databank (PDB), protein geometries are represented in a 20-letter alphabet. This encoding transforms the structure alignment problem into a sequence alignment problem and allows the well-established algorithms of sequence alignment to be employed. Such geometric alignments offer distinct advantages over structural alignments in Cartesian coordinates as it better handles structural subtleties associated with slight twists and bends that distort one structure relative to another. RESULTS: The performance of programs for pairwise local alignment (TLOCAL) and multiple alignment (TCLUSTALW) are readily adapted from existing code for Smith-Waterman pairwise alignment and for multiple sequence alignment using CLUSTALW. The alignment algorithms employed a blocked scoring matrix (TBLOSUM) generated using the frequency of changes in the geometric alphabet of a block of protein structures. TLOCAL was tested on a set of 10 difficult proteins and found to give high quality alignments that compare favorably to those generated by existing pairwise alignment programs. A set of protein comparison involving hinged structures was also analyzed and TLOCAL was seen to compare favorably to other alignment methods. TCLUSTALW was tested on a family of protein kinases and reveal conserved regions similar to those previously identified by a hand alignment. CONCLUSION: These results show that the encoding of the writhing number as a geometric measure allow high quality structure alignments to be generated using standard algorithms of sequence alignment. This approach provides computationally efficient algorithms that allow fast database searching and multiple structure alignment. Because the geometric measure can employ different window sizes, the method allows the exploration of alignments on different, well-defined length scales

Springer - Publisher Connector

PubMed Central