17 research outputs found

    Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Get PDF
    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

    The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

    Get PDF
    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

    Characterization of recombinant plasmids carrying Drosophila transfer RNA genes

    No full text
    The purpose of this study was to characterize recombinant plasraids carrying Drosophila melanogaster tRNA genes. The two groups of recombinant plasmids studied were those which carried tRNA₄Val genes and those with tRNA₄,₇Ser genes. pDt92 and pDt120, both tRNA₄Val gene-carrying plasmids, were characterized initially to determine the number of inserts they contained and the size of the inserts. For plasmids containing multiple inserts, the insert which carried the tRNA₄Val gene was also determined. These characteristics were studied by HindIII digestion of the plasmid DNA, agarose gel electrophoresis, Southern transfer onto nitrocellulose filters and hybridization to [¹²⁵I] tRNA₄Val. It was found that both, pDt92 and pDt120 contained two inserts each of sizes 0.5kb and 1.7kb,and 2.0kb and 5.*fkb respectively, with the 0.5kb and 2.0kb fragments carrying the tRNA₄Val genes. pDt92 and pDt120 then were recloned so as to contain only the fragments which carried the tRNA₄Val genes, namely the 0.5kb and 2.0kb fragment respectively. pDt92RC and pDt120RC plus three other tRNA₄,₇Ser gene containing plasmids, pDt16, pDt17RC and pDt27RC were further characterized by the technique of in situ hybridization to study the organization of these tRNA genes on the Drosophila genome. Four of these plasmids with the exception of pDt17RC hybridized to only one site on the Drosophila chromosome. Both, pDt92RC and pDt120RC hybridized to the 90BC site on the right arm of the third chromosome; pDt16 and pDt27RC hybridized to the 12DE site on the first or the X chromosome. pDt17RC on the other hand hybridized predominantly to the 12DE site and to a lesser extent to 2}E (2L), 56D (2R), 62D (3L) and 64D (3L) sites. These in situ hybridization results when studied together with those reported by Dunn et al. (1979b) show that genes for a single species of tRNA are located on more than one site on the Drosophila genome.Science, Faculty ofMicrobiology and Immunology, Department ofGraduat

    The DNA sequence and transcriptional analyses of Drosophila melanogaster transfer RNA valine genes

    No full text
    The nucleotide sequence of the single Drosophila meianogaster tRNA gene contained in the recombinant plasmid, pDtl20R was determined by the Maxam and Gilbert method. This plasmid hybridizes to the 90 BC site on the Val Drosophila polytene chromosomes, a minor site of tRNA4 hybridization. The Val nucleotide sequence of the tRNA4 gene present in pDtl20R differs at four Val positions from the sequence expected from that of tRNA4 . The four differences occur at nucleotides 16, 29, 41 and 57 in the coding region. Comparison of the DNA sequence of pDtl20R to that of the plasmid pDt92R, which also hybridizes to the 90 BC site, indicates that the Drosophila fragments contained in these two plasmids are either alleles or repeats. The implications of these findings are discussed. An in vitro transcription system was developed from a Drosophila Schneider II cell line. This homologous cell-free extract support specific and accurate transcription of various Drosophila tRNA Val genes. The major product of transcription is a tRNA precursor which is processed to a tRNA sized species. Transfer RNA valine genes originating from different sites on the Drosophila chromosomes are transcribed at different rates. Comparison of the sequences in the internal promoter regions of the various genes indicates that the few differences within the coding regions may not be responsible for the observed difference in the rates of transcription. This conclusion is substantiated by studies with hybrid genes constructed during the course of this work. Preliminary evidence indicates that the Val tRNA gene which is transcribed at the highest rate may be preceded in its 5'-flanking region by a positively modulating sequence. Val The precursor RNAs directed by various tRNA genes are also processed at different rates. Transcription and processing experiments with hybrid genes suggest that nucleotide changes within the coding region, which do not affect the rate of transcription, influence the rate of processing. Time course and competition experiments demonstrate that at least two kinetic steps are required for the formation of a stable transcription complex. Studies with an in vitro constructed mutant missing in nucleotides 51-61 in the tRNA coding region suggests that this deleted region (which is highly conserved in eukaryotic tRNAs) may be involved in the primary interaction required for tRNA gene transcription.Science, Faculty ofMicrobiology and Immunology, Department ofGraduat

    Detailed dynamic rheological studies of multiwall carbon nanotube-reinforced acrylonitrile butadiene styrene composite

    No full text
    Dynamic rheological properties of multiwalled carbon nanotubes-(MWCNTs) reinforced acrylonitrile butadiene styrene (ABS) composites prepared by micro twin-screw extruder with back flow channel (used for proper dispersion) are reported. Scanning electron microscopic and high-resolution transmission electron microscopic studies showed that the nanotubes were uniformly dispersed in the ABS polymer matrix. MWCNT forms a network throughout the polymer matrix and thus promotes the reinforcement. The rheological studies showed that (for 3 wt% of MWCNTs loading) the material undergoes viscous to elastic transition. At a higher MWCNTs concentration nematic gel-like phase is observed where both storage and loss modulus (G' and GaEuro(3)) are nearly independent of frequency. van Gurp-Palmen plot has been used to determine the viscoelastic properties. Dynamic intersection frequency has been used to correlate the rheological properties with different wt% of MWCNTs loading in ABS. Dynamic rheological measurements revealed the viscous-like (GaEuro(3) > G') behaviour at a lower MWCNTs loading ( 3 wt%)

    Curation at the NCBI: Genomes, Genes, & Sequence Standards

    No full text
    The National Center for Biotechnology Information (NCBI) provides curation support for many genomes, and disseminates information in several resources including Entrez Gene, reference sequences (RefSeq), the Consensus CDS (CCDS) database, and the Genome Reference Consortium (GRC). These projects are supported by several collaborations to provide:1) support to the international consortium maintaining the assemblies for human and mouse (GRC); 2) sequence standards for chromosomes, genes, transcripts and proteins (RefSeq); 3) reports of integrated information including nomenclature, publications, phenotypes and diseases, sequences, ontologies, interactions (Gene); and 4) identification of proteins that are consistently annotated on the human and mouse reference genomes, and consistently updated by collaborating members (CCDS). 

NCBI curation of any one data type (e.g., a gene) is closely integrated with evaluation of the genome assembly, and determining annotation by way of RefSeq transcript and protein sequences. Database and work-flow infrastructure is designed to support reporting and tracking issues with the assembly, gene, or evidence data to collaborating groups, and to support collaborative review and discussions of issues that arise. Curation depends on publicly available information to represent the gene extent, alternatively spliced transcripts, and protein isoforms. Scientific consults occur regularly and wet-bench validation needs are supported by some of the collaborations. Curation of genome annotation results in improved data presentation at the three major genome browser sites (Ensembl, NCBI, UCSC) and has resulted in efforts to define common curation guidelines to maximize consistency and minimize conflicts.

The presentation focuses on curation of the human genome, genes, and RefSeq sequence standards
    corecore