55 research outputs found

    NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    Get PDF
    NCBI's reference sequence (RefSeq) database () is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2 879 860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence

    The chicken gene nomenclature committee report

    Get PDF
    Comparative genomics is an essential component of the post-genomic era. The chicken genome is the first avian genome to be sequenced and it will serve as a model for other avian species. Moreover, due to its unique evolutionary niche, the chicken genome can be used to understand evolution of functional elements and gene regulation in mammalian species. However comparative biology both within avian species and within amniotes is hampered due to the difficulty of recognising functional orthologs. This problem is compounded as different databases and sequence repositories proliferate and the names they assign to functional elements proliferate along with them. Currently, genes can be published under more than one name and one name sometimes refers to unrelated genes. Standardized gene nomenclature is necessary to facilitate communication between scientists and genomic resources. Moreover, it is important that this nomenclature be based on existing nomenclature efforts where possible to truly facilitate studies between different species. We report here the formation of the Chicken Gene Nomenclature Committee (CGNC), an international and centralized effort to provide standardized nomenclature for chicken genes. The CGNC works in conjunction with public resources such as NCBI and Ensembl and in consultation with existing nomenclature committees for human and mouse. The CGNC will develop standardized nomenclature in consultation with the research community and relies on the support of the research community to ensure that the nomenclature facilitates comparative and genomic studies

    Locus Reference Genomic sequences: an improved basis for describing human DNA variants

    Get PDF
    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org

    Database resources of the National Center for Biotechnology Information

    Get PDF
    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data retrieval systems and computational resources for the analysis of data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, Entrez Programming Utilities, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov

    ClinGen — The Clinical Genome Resource

    Get PDF
    On autopsy, a patient is found to have hypertrophic cardiomyopathy. The patient’s family pursues genetic testing that shows a “likely pathogenic” variant for the condition on the basis of a study in an original research publication. Given the dominant inheritance of the condition and the risk of sudden cardiac death, other family members are tested for the genetic variant to determine their risk. Several family members test negative and are told that they are not at risk for hypertrophic cardiomyopathy and sudden cardiac death, and those who test positive are told that they need to be regularly monitored for cardiomyopathy on echocardiography. Five years later, during a routine clinic visit of one of the genotype-positive family members, the cardiologist queries a database for current knowledge on the genetic variant and discovers that the variant is now interpreted as “likely benign” by another laboratory that uses more recently derived population-frequency data. A newly available testing panel for additional genes that are implicated in hypertrophic cardiomyopathy is initiated on an affected family member, and a different variant is found that is determined to be pathogenic. Family members are retested, and one member who previously tested negative is now found to be positive for this new variant. An immediate clinical workup detects evidence of cardiomyopathy, and an intracardiac defibrillator is implanted to reduce the risk of sudden cardiac death

    The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

    Get PDF
    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

    Database resources of the National Center for Biotechnology Information

    Get PDF
    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

    Pharmacogenetic allele nomenclature: International workgroup recommendations for test result reporting

    Get PDF
    This manuscript provides nomenclature recommendations developed by an international workgroup to increase transparency and standardization of pharmacogenetic (PGx) result reporting. Presently, sequence variants identified by PGx tests are described using different nomenclature systems. In addition, PGx analysis may detect different sets of variants for each gene, which can affect interpretation of results. This practice has caused confusion and may thereby impede the adoption of clinical PGx testing. Standardization is critical to move PGx forward
    corecore