806 research outputs found

    Optimizing substitution matrix choice and gap parameters for sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    A user-friendly web portal for T-Coffee on supercomputers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers.</p> <p>Results</p> <p>In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments.</p> <p>Conclusions</p> <p>The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.</p

    Reef fishes at all trophic levels respond positively to effective marine protected areas

    Get PDF
    Marine Protected Areas (MPAs) offer a unique opportunity to test the assumption that fishing pressure affects some trophic groups more than others. Removal of larger predators through fishing is often suggested to have positive flow-on effects for some lower trophic groups, in which case protection from fishing should result in suppression of lower trophic groups as predator populations recover. We tested this by assessing differences in the trophic structure of reef fish communities associated with 79 MPAs and open-access sites worldwide, using a standardised quantitative dataset on reef fish community structure. The biomass of all major trophic groups (higher carnivores, benthic carnivores, planktivores and herbivores) was significantly greater (by 40% - 200%) in effective no-take MPAs relative to fished open-access areas. This effect was most pronounced for individuals in large size classes, but with no size class of any trophic group showing signs of depressed biomass in MPAs, as predicted from higher predator abundance. Thus, greater biomass in effective MPAs implies that exploitation on shallow rocky and coral reefs negatively affects biomass of all fish trophic groups and size classes. These direct effects of fishing on trophic structure appear stronger than any top down effects on lower trophic levels that would be imposed by intact predator populations. We propose that exploitation affects fish assemblages at all trophic levels, and that local ecosystem function is generally modified by fishing

    Osteoarticular Infection in Three Young Thoroughbred Horses Caused by a Novel Gram Negative Cocco-Bacillus

    Full text link
    © 2020 Bernard J. Hudson et al. We describe three cases of osteoarticular infection (OAI) in young thoroughbred horses in which the causative organism was identified by MALDI-TOF as Kingella species. The pattern of OAI resembled that reported with Kingella infection in humans. Analysis by 16S rRNA PCR enabled construction of a phylogenetic tree that placed the isolates closer to Simonsiella and Alysiella species, rather than Kingella species. Average nucleotide identity (ANI) comparison between the new isolate and Kingella kingae and Alysiella crassa however revealed low probability that the new isolate belonged to either of these species. This preliminary analysis suggests the organism isolated is a previously unrecognised species

    Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

    Get PDF
    Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

    Grammar-based distance in progressive multiple sequence alignment

    Get PDF
    Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

    Shallow water marine sediment bacterial community shifts along a natural CO2 gradient in the Mediterranean Sea off Vulcano, Italy.

    Get PDF
    The effects of increasing atmospheric CO(2) on ocean ecosystems are a major environmental concern, as rapid shoaling of the carbonate saturation horizon is exposing vast areas of marine sediments to corrosive waters worldwide. Natural CO(2) gradients off Vulcano, Italy, have revealed profound ecosystem changes along rocky shore habitats as carbonate saturation levels decrease, but no investigations have yet been made of the sedimentary habitat. Here, we sampled the upper 2 cm of volcanic sand in three zones, ambient (median pCO(2) 419 μatm, minimum Ω(arag) 3.77), moderately CO(2)-enriched (median pCO(2) 592 μatm, minimum Ω(arag) 2.96), and highly CO(2)-enriched (median pCO(2) 1611 μatm, minimum Ω(arag) 0.35). We tested the hypothesis that increasing levels of seawater pCO(2) would cause significant shifts in sediment bacterial community composition, as shown recently in epilithic biofilms at the study site. In this study, 454 pyrosequencing of the V1 to V3 region of the 16S rRNA gene revealed a shift in community composition with increasing pCO(2). The relative abundances of most of the dominant genera were unaffected by the pCO(2) gradient, although there were significant differences for some 5 % of the genera present (viz. Georgenia, Lutibacter, Photobacterium, Acinetobacter, and Paenibacillus), and Shannon Diversity was greatest in sediments subject to long-term acidification (>100 years). Overall, this supports the view that globally increased ocean pCO(2) will be associated with changes in sediment bacterial community composition but that most of these organisms are resilient. However, further work is required to assess whether these results apply to other types of coastal sediments and whether the changes in relative abundance of bacterial taxa that we observed can significantly alter the biogeochemical functions of marine sediments
    corecore