71 research outputs found
MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs
MotifCluster finds related motifs in a set of sequences and clusters the sequences into families using the motifs they contain
PyCogent: a toolkit for making sense from sequence
The COmparative GENomic Toolkit, a framework for probabilistic analyses of biological sequences, devising workflows and generating publication quality graphics, has been implemented in Python
Recommended from our members
Studying trends of non-coding RNA function and evolution
RNA is a special type of molecule in the sense that it is an information carrier, and is also able to catalyze chemical reactions. It is consequently believed that RNA predated protein and DNA as a catalyst and information carrier in an \u22RNA World\u22. A greater understanding of evolutionary and functional features of non-coding RNA is not only fundamental to elucidating the evolutionary mechanisms that give rise to RNA function, perhaps giving insight into the origin of life in an RNA World, but is necessary for the advancement of RNA biotechnology and RNA based therapeutics. Recent advancements in high-throughput sequencing technologies have provided the ability to study the function of non-coding RNAs at an unprecedented depth, producing millions to billions of sequences from a single experiment. This poses new challenges to researchers, as traditional biochemical and computational techniques are unable to scale to the massive amounts of data each experiment produces. In this work, I present new computational tools, methods, and their applications in the study of non-coding RNA evolution. I have assembled a gold standard set of non- coding RNA alignments that have been manually curated and aligned to their known 3d structures. These manual alignments address the need for RNA alignments with structural annotation that current automated alignment algorithms do not provide. Next, I present an application of alignments to the study of tRNA evolution. tRNAs, an integral part of the modern translation machinery, are believed to be poor phylogenetic markers. Using UniFrac to cluster genomes based on the collection of tRNAs they contain, I show that these tRNA trees are similar to trees constructed from rRNA from the same organisms, congruent with universal phylogeny. Finally, I describe a technique developed to simultaneously measure the dissociation constant (KD) of a pool of thousands of amino acid binding RNA aptamers obtained by in vitro selection, improving over the traditional laborious process of determining KD one sequence at a time
Recommended from our members
Boulder ALignment Editor (ALE): a web-based RNA alignment tool.
SummaryThe explosion of interest in non-coding RNAs, together with improvements in RNA X-ray crystallography, has led to a rapid increase in RNA structures at atomic resolution from 847 in 2005 to 1900 in 2010. The success of whole-genome sequencing has led to an explosive growth of unaligned homologous sequences. Consequently, there is a compelling and urgent need for user-friendly tools for producing structure-informed RNA alignments. Most alignment software considers the primary sequence alone; some specialized alignment software can also include Watson-Crick base pairs, but none adequately addresses the needs introduced by the rapid influx of both sequence and structural data. Therefore, we have developed the Boulder ALignment Editor (ALE), which is a web-based RNA alignment editor, designed for editing and assessing alignments using structural information. Some features of BoulderALE include the annotation and evaluation of an alignment based on isostericity of Watson-Crick and non-Watson-Crick base pairs, along with the collapsing (horizontally and vertically) of the alignment, while maintaining the ability to edit the alignment.Availabilityhttp://www.microbio.me/boulderale
Recommended from our members
MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs.
MotifCluster finds related motifs in a set of sequences, and clusters the sequences into families using the motifs they contain. MotifCluster, at http://bmf.colorado.edu/motifcluster, lets users test whether proteins are related, cluster sequences by shared conserved motifs, and visualize motifs mapped onto trees, sequences and three-dimensional structures. We demonstrate MotifCluster's accuracy using gold-standard protein superfamilies; using recommended settings, families were assigned to the correct superfamilies with 0.17% false positive and no false negative assignments
Simple, recurring RNA binding sites for L-arginine
Seven new arginine binding motifs have been selected from a heterogeneous RNA pool containing 17, 25, and 50mer randomized tracts, yielding 131 independently derived binding sites that are multiply isolated. The shortest 17mer random region is sufficient to build varied arginine binding sites using five different conserved motifs (motifs 1a, 1b, 1c, 2, and 4). Dissociation constants are in the fractional millimolar to millimolar range. Binding sites are amino acid side-chain specific and discriminate moderately between L- and D-stereoisomers of arginine, suggesting a molecular focus on side-chain guanidinium. An arginine coding triplet (codon/anticodon) is highly conserved within the largest family of Arg sites (72% of all sequences), as has also been found in minimal, most prevalent RNA binding sites for Ile, His, and Trp
PSI-BLAST: Position-Specific Iterated Basic Local Alignment Search Tool
picking non-redundant sequences from larg
Stable tRNA-based phylogenies using only 76 nucleotides
tRNAs are among the most ancient, highly conserved sequences on earth, but are often thought to be poor phylogenetic markers because they are short, often subject to horizontal gene transfer, and easily change specificity. Here we use an algorithm now commonly used in microbial ecology, UniFrac, to cluster 175 genomes spanning all three domains of life based on the phylogenetic relationships among their complete tRNA pools. We find that the overall pattern of similarities and differences in the tRNA pools recaptures universal phylogeny to a remarkable extent, and that the resulting tree is similar to the distribution of bootstrapped rRNA trees from the same genomes. In contrast, the trees derived from tRNAs of identical specificity or of individual isoacceptors generally produced trees of lower quality. However, some tRNA isoacceptors were very good predictors of the overall pattern of organismal evolution. These results show that UniFrac can extract meaningful biological patterns from even phylogenies with high level of statistical inaccuracy and horizontal gene transfer, and that, overall, the pattern of tRNA evolution tracks universal phylogeny and provides a background against which we can test hypotheses about the evolution of individual isoacceptors
- …