826 research outputs found
A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana
Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites. We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software. Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.https://doi.org/10.1186/1471-2105-8-15
Global comparative analysis of ESTs from the southern cattle tick, Rhipicephalus (Boophilus) microplus
<p>Abstract</p> <p>Background</p> <p>The southern cattle tick, <it>Rhipicephalus (Boophilus) microplus</it>, is an economically important parasite of cattle and can transmit several pathogenic microorganisms to its cattle host during the feeding process. Understanding the biology and genomics of <it>R. microplus </it>is critical to developing novel methods for controlling these ticks.</p> <p>Results</p> <p>We present a global comparative genomic analysis of a gene index of <it>R. microplus </it>comprised of 13,643 unique transcripts assembled from 42,512 expressed sequence tags (ESTs), a significant fraction of the complement of <it>R. microplus </it>genes. The source material for these ESTs consisted of polyA RNA from various tissues, lifestages, and strains of <it>R. microplus</it>, including larvae exposed to heat, cold, host odor, and acaricide. Functional annotation using RPS-Blast analysis identified conserved protein domains in the conceptually translated gene index and assigned GO terms to those database transcripts which had informative BlastX hits. Blast Score Ratio and SimiTri analysis compared the conceptual transcriptome of the <it>R. microplus </it>database to other eukaryotic proteomes and EST databases, including those from 3 ticks. The most abundant protein domains in BmiGI were also analyzed by SimiTri methodology.</p> <p>Conclusion</p> <p>These results indicate that a large fraction of BmiGI entries have no homologs in other sequenced genomes. Analysis with the PartiGene annotation pipeline showed 64% of the members of BmiGI could not be assigned GO annotation, thus minimal information is available about a significant fraction of the tick genome. This highlights the important insights in tick biology which are likely to result from a tick genome sequencing project. Global comparative analysis identified some tick genes with unexpected phylogenetic relationships which detailed analysis attributed to gene losses in some members of the animal kingdom. Some tick genes were identified which had close orthologues to mammalian genes. Members of this group would likely be poor choices as targets for development of novel tick control technology.</p
OperonDB: a comprehensive database of predicted operons in microbial genomes
The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first released in 2001, is a database containing the results of a computational algorithm for locating operon structures in microbial genomes. OperonDB has grown from 34 genomes in its initial release to more than 500 genomes today. In addition to increasing the size of the database, we have re-designed our operon finding algorithm and improved its accuracy. The new database is updated regularly as additional genomes become available in public archives. OperonDB can be accessed at: http://operondb.cbcb.umd.ed
Digging into acceptor splice site prediction : an iterative feature selection approach
Feature selection techniques are often used to reduce data dimensionality, increase classification performance, and gain insight into the processes that generated the data. In this paper, we describe an iterative procedure of feature selection and feature construction steps, improving the classification of acceptor splice sites, an important subtask of gene prediction.
We show that acceptor prediction can benefit from feature selection, and describe how feature selection techniques can be used to gain new insights in the classification of acceptor sites. This is illustrated by the identification of a new, biologically motivated feature: the AG-scanning feature.
The results described in this paper contribute both to the domain of gene prediction, and to research in feature selection techniques, describing a new wrapper based feature weighting method that aids in knowledge discovery when dealing with complex datasets
The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.
A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms
Upgrades to StellaBase facilitate medical and genetic studies on the starlet sea anemone, Nematostella vectensis
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The definitive version was published in Nucleic Acids Research 36 (2008): D607-D611, doi:10.1093/nar/gkm941.The starlet sea anemone, Nematostella vectensis, is a basal metazoan organism that has recently emerged as an important model system in developmental biology and evolutionary genomics. StellaBase, the Nematostella Genomics Database (http://stellabase.org), was developed in 2005 as a resource to support the Nematostella research community. Recently, it has become apparent that Nematostella may be a particularly useful system for studying (i) microevolutionary variation in natural populations, and (ii) the functional evolution of human disease genes. We have developed two new databases that will foster such studies: StellaBase Disease (http://stellabase.org/disease) is a relational database that houses 155 904 invertebrate homologous isoforms of human disease genes from four leading genomic model systems (fly, worm, yeast and Nematostella), including 14 874 predicted genes from the sea anemone itself. StellaBase SNP (http://stellabase.org/SNP) is a relational database that describes the location and underlying type of mutation for 20 063 single nucleotide polymorphisms.This work was supported by NSF grant FP-91656101-0 to J.C.S. and J.R.F. and EPA Grant F5E11155 to A.R.M. and J.R.F. and by a Postdoctoral Scholar Program at the Woods Hole Oceanographic Institution, with funding provided by The Beacon Institute for Rivers and Estuaries, and the J. Seward Johnson Fund to A.M.R
Upgrades to StellaBase facilitate medical and genetic studies on the starlet sea anemone, Nematostella vectensis
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The definitive version was published in Nucleic Acids Research 36 (2008): D607-D611, doi:10.1093/nar/gkm941.The starlet sea anemone, Nematostella vectensis, is a basal metazoan organism that has recently emerged as an important model system in developmental biology and evolutionary genomics. StellaBase, the Nematostella Genomics Database (http://stellabase.org), was developed in 2005 as a resource to support the Nematostella research community. Recently, it has become apparent that Nematostella may be a particularly useful system for studying (i) microevolutionary variation in natural populations, and (ii) the functional evolution of human disease genes. We have developed two new databases that will foster such studies: StellaBase Disease (http://stellabase.org/disease) is a relational database that houses 155 904 invertebrate homologous isoforms of human disease genes from four leading genomic model systems (fly, worm, yeast and Nematostella), including 14 874 predicted genes from the sea anemone itself. StellaBase SNP (http://stellabase.org/SNP) is a relational database that describes the location and underlying type of mutation for 20 063 single nucleotide polymorphisms.This work was supported by NSF grant FP-91656101-0 to J.C.S. and J.R.F. and EPA Grant F5E11155 to A.R.M. and J.R.F. and by a Postdoctoral Scholar Program at the Woods Hole Oceanographic Institution, with funding provided by The Beacon Institute for Rivers and Estuaries, and the J. Seward Johnson Fund to A.M.R
Detection of lineage-specific evolutionary changes among primate species
<p>Abstract</p> <p>Background</p> <p>Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection.</p> <p>Results</p> <p>We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection.</p> <p>Conclusions</p> <p>DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.</p
PEACE: Parallel Environment for Assembly and Clustering of Gene Expression
We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools
- …