Search CORE

9 research outputs found

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

DSpace@MIT

PubMed Central

King's Research Portal

The Ribosomal Database project

Author: Bonnie L. Maidak
Carl R. Woese
Gary J. Olsen
James Blandy
Karl Fogel
Michael J. McCaughey
Niels Larsen
Ross Overbeek
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/1994
Field of study

Crossref

Curation at the NCBI: Genomes, Genes, & Sequence Standards

Author: Bhanu Rajput
Bonnie Maidak
Catherine Farrell
David Webb
Donna Maglott
Garth Brown
Janet Weber
Jennifer Hart
Kim D. Pruitt
Lillian Riddick
Melissa Landrum
Michael Murphy
Terence Murphy
Wendy Wu
Publication venue
Publication date: 26/05/2009
Field of study

The National Center for Biotechnology Information (NCBI) provides curation support for many genomes, and disseminates information in several resources including Entrez Gene, reference sequences (RefSeq), the Consensus CDS (CCDS) database, and the Genome Reference Consortium (GRC). These projects are supported by several collaborations to provide:1) support to the international consortium maintaining the assemblies for human and mouse (GRC); 2) sequence standards for chromosomes, genes, transcripts and proteins (RefSeq); 3) reports of integrated information including nomenclature, publications, phenotypes and diseases, sequences, ontologies, interactions (Gene); and 4) identification of proteins that are consistently annotated on the human and mouse reference genomes, and consistently updated by collaborating members (CCDS). 

NCBI curation of any one data type (e.g., a gene) is closely integrated with evaluation of the genome assembly, and determining annotation by way of RefSeq transcript and protein sequences. Database and work-flow infrastructure is designed to support reporting and tracking issues with the assembly, gene, or evidence data to collaborating groups, and to support collaborative review and discussions of issues that arise. Curation depends on publicly available information to represent the gene extent, alternatively spliced transcripts, and protein isoforms. Scientific consults occur regularly and wet-bench validation needs are supported by some of the collaborations. Curation of genome annotation results in improved data presentation at the three major genome browser sites (Ensembl, NCBI, UCSC) and has resulted in efforts to define common curation guidelines to maximize consistency and minimize conflicts.

The presentation focuses on curation of the human genome, genes, and RefSeq sequence standards

Crossref

Nature Precedings

Separation of complex mixtures by parallel development thin-layer chromatography

Author: Alexander P. W.
Bonnie L. Maidak
David. Nurok
Gerhardt G.
Gonnord M. F.
Hodges K. C.
Jupille T. H.
Mottier M.
Nurok D.
Nurok D.
Nurok D.
Present
Present
Ronald E. Tecklenburg
Soczewinski E.
Tecklenburg R. E.
Tecklenburg R. E.
Thoma J. A.
Touchstone J. C.
Vanderslice J. T.
Zar J. H.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

The completion of the Mammalian Gene Collection

Author: Astashyn Alex
Bonner Tom I.
Brown Garth
Buetow Kenneth H.
Collins Francis S.
Derge Jeffrey G.
Farrell Catherine
Feingold Elise A.
Gerhard Daniela S.
Good Peter J.
Hart Jennifer
Jang Wonhee
Landrum Melissa
Lewis Jeanne
Maidak Bonnie L.
Mandich Allison
Misquitta Leonie
Murphy Michael
Murphy Terence
Phan Lon
Rajput Bhanu
Rasooly Rebekah
Riddick Lillian
Robinson Cristen
Schaefer Carl F.
Shenmen Carolyn M.
Shoaf Debonny
Temple Gary
Wagner Lukas
Ward Ming
Yankie Linda
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2009
Field of study

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

University of Queensland eSpace