10 research outputs found

    Methods and strategies for gene structure curation in WormBase

    Get PDF
    The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ā€˜curation toolā€™, which integrates different data sources in order to identify new and correct gene structures

    WormBase 2012: more genomes, more data, new website

    Get PDF
    Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community

    WormBase - Annotating many nematode genomes

    Get PDF
    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBaseā€™s role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE

    WormBase 2014: new views of curated biology

    Get PDF
    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest

    Considerations for creating and annotating the budding yeast Genome Map at SGD: a progress report

    Get PDF
    The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org

    Overview of gene structure in C. elegans

    Full text link

    WormBase - Annotating many nematode genomes

    Get PDF
    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBaseā€™s role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE

    Computational Analysis of the Transcriptome Using Long-Read RNA Sequencing

    Get PDF
    Reconstructing the transcriptome from RNA sequencing reads is a challenging problem, especially when no high quality reference genome is available. Current transcriptome annotations have largely relied on short read lengths intrinsic to most widely used high-throughput cDNA sequencing technologies. For example, in the annotation of the Caenorhabditis elegans transcriptome, more than half of the transcript isoforms lack full-length support and instead rely on inference from short reads that do not span the full length of the isoform. Short read sequencing technologies, though accurate, cannot reliably reconstruct full-length transcripts due to the highly complex nature of the transcriptome with large gene families, widespread alternative splicing, and highly variable expression and coverage per transcript. We applied nanopore-based direct RNA sequencing to characterize the developmental polyadenylated transcriptome of C. elegans. Using this approach we provide support for 23,865 splice isoforms across 14,611 genes, without the need for computational reconstruction of gene models. In addition, we have developed an open source de novo transcriptome assembly method, CONDUIT, which uses single molecule long read RNA sequencing to generate scaffolded splice graphs independent of a reference genome. It then pseudomaps short-read RNA sequencing reads to isoforms extracted from the scaffolded splice graph, polishes these splice graphs using both short and long read data, and outputs consensus isoforms extracted from these splice graphs. We show that CONDUIT produces highly accurate consensus isoforms, completely independent of a reference genome in several model systems and in a novel pathogenic yeast system
    corecore