959 research outputs found

    High-throughput gene discovery in the rat

    Get PDF
    The rat is an important animal model for human diseases and is widely used in physiology. In this article we present a new strategy for gene discovery based on the production of ESTs from serially subtracted and normalized cDNA libraries, and we describe its application for the development of a comprehensive nonredundant collection of rat ESTs. Our new strategy appears to yield substantially more EST clusters per ESTs sequenced than do previous approaches that did not use serial subtraction. However, multiple rounds of library subtraction resulted in high frequencies of otherwise rare internally primed cDNAs, defining the limits of this powerful approach. To date, we have generated >200,000 3′ ESTs from >100 cDNA libraries representing a wide range of tissues and developmental stages of the laboratory rat. Most importantly, we have contributed to ∼50,000 rat UniGene clusters. We have identified, arrayed, and derived 5′ ESTs from >30,000 unique rat cDNA clones. Complete information, including radiation hybrid mapping data, is also maintained locally at http://genome.uiowa.edu/clcg.html. All of the sequences described in this article have been submitted to the dbEST division of the NCBI

    Computational studies with ESTs: assembly, SNP detection, and applications in alternative splicing

    Get PDF
    EST sequences are important in functional genomics studies. To better use available EST resources, clustering and assembling are crucial techniques. For EST sequences with deep coverage, no current assembly program can handle them well. We describe a deep assembly program named DA. The program keeps the number of differences in each contig alignment under control by making corrections to differences that are likely due to sequencing errors. Experimental results on the 115 clusters from the UniGene database show that DA can handle data sets of deep coverage efficiently. A comparison of the DA consensus sequences with the finished human and mouse genomes indicates that the consensus sequences are of acceptable quality;EST sequences can be used in SNP discovery. We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs;EST sequences can also be used to study alternative splicing (AS), which is the most common post transcription event in metazoans. We first developed a pipeline to identify AS forms by comparing alignments between expressed sequences and genomic sequences. Then we studied the relationship between AS and gene duplication. We observed that duplicate genes have fewer AS forms than single-copy genes; we also found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications. We also compared AS among six species. We found significant differences on both AS rates and splice forms per gene among the studied species by detailed and categorized studies. The difference in AS rate between rice and Arabidopsis is significant enough to lead to a difference in protein diversity between those two species

    Transcriptome profiling of a toxic dinoflagellate reveals a gene-rich protist and a potential impact on gene expression due to bacterial presence

    Get PDF
    © The Authors, 2010. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS One 5 (2010): e9688, doi:10.1371/journal.pone.0009688.Dinoflagellates are unicellular, often photosynthetic protists that play a major role in the dynamics of the Earth's oceans and climate. Sequencing of dinoflagellate nuclear DNA is thwarted by their massive genome sizes that are often several times that in humans. However, modern transcriptomic methods offer promising approaches to tackle this challenging system. Here, we used massively parallel signature sequencing (MPSS) to understand global transcriptional regulation patterns in Alexandrium tamarense cultures that were grown under four different conditions. We generated more than 40,000 unique short expression signatures gathered from the four conditions. Of these, about 11,000 signatures did not display detectable differential expression patterns. At a p-value < 1E-10, 1,124 signatures were differentially expressed in the three treatments, xenic, nitrogen-limited, and phosphorus-limited, compared to the nutrient-replete control, with the presence of bacteria explaining the largest set of these differentially expressed signatures. Among microbial eukaryotes, dinoflagellates contain the largest number of genes in their nuclear genomes. These genes occur in complex families, many of which have evolved via recent gene duplication events. Our expression data suggest that about 73% of the Alexandrium transcriptome shows no significant change in gene expression under the experimental conditions used here and may comprise a “core” component for this species. We report a fundamental shift in expression patterns in response to the presence of bacteria, highlighting the impact of biotic interaction on gene expression in dinoflagellates.This work was primarily funded by a collaborative grant from the National Institutes of Health (R01 ES 013679-01A2) awarded to DB, DMA, and M. Bento Soares. Funding support for DMA and DLE was also provided from the Woods Hole Center for Oceans and Human Health from the NSF/NIEHS Centers for Oceans and Human Health program, NIEHS (P50 ES 012742) and (NSF OCE-043072). Additional support came from the National Science Foundation (EF-0732440) in a grant awarded to F. Gerald Plumley, DB, JDH, and DMA. AM was supported by an Institutional NRSA (T 32 GM98629)

    Organization and evolution of information within eukaryotic genomes.

    Get PDF

    Computational analysis of proteomes from parasitic nematodes

    Get PDF

    The development and application of informatics-based systems for the analysis of the human transcriptome

    Get PDF
    Philosophiae Doctor - PhDDespite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash; the location and timing of transcript expression &ndash; provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed. In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.South Afric

    Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release

    Get PDF
    BACKGROUND: Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications. RESULTS: Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5). CONCLUSION: Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms

    Spinning Gland Transcriptomics from Two Main Clades of Spiders (Order: Araneae) - Insights on Their Molecular, Anatomical and Behavioral Evolution

    Get PDF
    Characterized by distinctive evolutionary adaptations, spiders provide a comprehensive system for evolutionary and developmental studies of anatomical organs, including silk and venom production. Here we performed cDNA sequencing using massively parallel sequencers (454 GS-FLX Titanium) to generate ∼80,000 reads from the spinning gland of Actinopus spp. (infraorder: Mygalomorphae) and Gasteracantha cancriformis (infraorder: Araneomorphae, Orbiculariae clade). Actinopus spp. retains primitive characteristics on web usage and presents a single undifferentiated spinning gland while the orbiculariae spiders have seven differentiated spinning glands and complex patterns of web usage. MIRA, Celera Assembler and CAP3 software were used to cluster NGS reads for each spider. CAP3 unigenes passed through a pipeline for automatic annotation, classification by biological function, and comparative transcriptomics. Genes related to spider silks were manually curated and analyzed. Although a single spidroin gene family was found in Actinopus spp., a vast repertoire of specialized spider silk proteins was encountered in orbiculariae. Astacin-like metalloproteases (meprin subfamily) were shown to be some of the most sampled unigenes and duplicated gene families in G. cancriformis since its evolutionary split from mygalomorphs. Our results confirm that the evolution of the molecular repertoire of silk proteins was accompanied by the (i) anatomical differentiation of spinning glands and (ii) behavioral complexification in the web usage. Finally, a phylogenetic tree was constructed to cluster most of the known spidroins in gene clades. This is the first large-scale, multi-organism transcriptome for spider spinning glands and a first step into a broad understanding of spider web systems biology and evolution

    The molecular underpinnings of neuronal cell identity in the stomatogastric ganglion of cancer borealis

    Get PDF
    Throughout the life of an organism, the nervous system must be able to balance changing in response to environmental stimuli with the need to produce reliable, repeatable activity patterns to create stereotyped behaviors. Understanding the mechanisms responsible for this regulation requires a wealth of knowledge about the neural system, ranging from network connectivity and cell type identification to intrinsic neuronal excitability and transcriptomic expression. To make strides in this area, we have employed the well-described stomatogastric nervous system of the Jonah crab Cancer borealis to examine the molecular underpinnings and regulation of neuron cell identity. Several crustacean circuits, including the stomatogastric nervous system and the cardiac ganglion, continue to provide important new insights into circuit dynamics and modulation (Diehl, White, Stein, & Nusbaum, 2013; Marder, 2012; Marder & Bucher, 2007; Williams et al., 2013), but this work has been partially hampered by the lack of extensive molecular sequence knowledge in crustaceans. Here we generated de novo transcriptome assembly from central nervous system tissue for C. borealis producing 42,766 contigs, focusing on an initial identification, curation, and comparison of genes that will have the most profound impact on our understanding of circuit function in these species. This included genes for 34 distinct ion channel types, 17 biogenic amine and 5 GABA receptors, 28 major transmitter receptor subtypes including glutamate and acetylcholine receptors, and 6 gap junction proteins -- the Innexins. ... With this reference transcriptome and annotated sequences in hand, we sought to determine the strengths and limitations of using the neuronal molecular profile to classify them into cell types. ... Since the resulting activity of a neuron is the product of the expression of ion channel genes, we sought to further probe the expression profile of neurons across a range of cell types to understand how these patterns of mRNA abundance relate to the properties of individual cell types. ... Finally, we sought to better understand the molecular underpinnings of how these correlated patterns of mRNA expression are generated and maintained.Includes bibliographical reference
    corecore