Search CORE

10 research outputs found

Methods and strategies for gene structure curation in WormBase

Author: A. S. Rogers
Allen
Altschul
Brent
Celniker
Cherry
Coghlan
Emanuelsson
G. W. Williams
J. Spieth
Kersey
Lamesch
Lewis
Nagy
P. A. Davis
P. Ozersky
Rice
Schrimpf
Stein
T. Bieri
The C. elegans Sequencing Consortium
Publication venue: Oxford University Press
Publication date
Field of study

The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures

Crossref

PubMed Central

WormBase 2012: more genomes, more data, new website

Author: Chan Juancarlos
Chen Wen J.
Fang Ruihua
Ganesan Uma
Grove Christian
Kadam Snehalata
Kishore Ranjana
Lee Raymond
Li Yuling
Muller Hans-Michael
Nakamura Cecilia
Raciti Daniela
Rangarajan Arun
Schindelman Gary
Schwarz Erich M.
Sternberg Paul W.
Van Auken Kimberly
Wang Daniel
Wang Xiaodong
Yook Karen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community

Caltech Authors

WormBase - Annotating many nematode genomes

Author: Davis Paul
Durbin Richard
Howe Kevin
Kersey Paul
Paulini Michael
Sternberg Paul W.
Tuli Mary Ann
Williams Gary
Yook Karen
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase’s role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE

PubMed Central

Caltech Authors

WormBase 2014: new views of curated biology

Author: Chan Juancarlos
Chen Wen J.
Done James
Grove Christian
Harris Todd W.
Kishore Ranjana
Lee Raymond
Li Yuling
Müller Hans-Michael
Nakamura Cecilia
Raciti Daniela
Schindelman Gary
Sternberg Paul W.
Van Auken Kimberly
Wang Daniel
Wang Xiaodong
Yook Karen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest

Caltech Authors

Considerations for creating and annotating the budding yeast Genome Map at SGD: a progress report

Author: Ashburner
Brazma
Celniker
Chen
Christie
E. T. Chan
Eilbeck
Guillemette
Harbison
Hesselberth
Hinrichs
J. M. Cherry
Kaplan
Kellis
Lee
Levin
Liu
Macisaac
M ller
Ozsolak
Pokholok
Ren
Shalon
Stein
Steinmetz
The ENCODE Project Consortium
Venters
Xu
Xu
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org

Crossref

PubMed Central

Overview of gene structure in C. elegans

Author: John Spieth
Publication venue: 'WormBook'
Publication date
Field of study

Crossref

WormBase - Annotating many nematode genomes

Author: Davis Paul
Durbin Richard
Howe Kevin
Kersey Paul
Paulini Michael
Sternberg Paul W.
Tuli Mary Ann
Williams Gary
Yook Karen
Publication venue: 'Informa UK Limited'
Publication date: 04/04/2012
Field of study

Computational Analysis of the Transcriptome Using Long-Read RNA Sequencing

Author: Roach Nathan Patrick
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 08/01/2021
Field of study

Reconstructing the transcriptome from RNA sequencing reads is a challenging problem, especially when no high quality reference genome is available. Current transcriptome annotations have largely relied on short read lengths intrinsic to most widely used high-throughput cDNA sequencing technologies. For example, in the annotation of the Caenorhabditis elegans transcriptome, more than half of the transcript isoforms lack full-length support and instead rely on inference from short reads that do not span the full length of the isoform. Short read sequencing technologies, though accurate, cannot reliably reconstruct full-length transcripts due to the highly complex nature of the transcriptome with large gene families, widespread alternative splicing, and highly variable expression and coverage per transcript. We applied nanopore-based direct RNA sequencing to characterize the developmental polyadenylated transcriptome of C. elegans. Using this approach we provide support for 23,865 splice isoforms across 14,611 genes, without the need for computational reconstruction of gene models. In addition, we have developed an open source de novo transcriptome assembly method, CONDUIT, which uses single molecule long read RNA sequencing to generate scaffolded splice graphs independent of a reference genome. It then pseudomaps short-read RNA sequencing reads to isoforms extracted from the scaffolded splice graph, polishes these splice graphs using both short and long read data, and outputs consensus isoforms extracted from these splice graphs. We show that CONDUIT produces highly accurate consensus isoforms, completely independent of a reference genome in several model systems and in a novel pathogenic yeast system

JScholarship