Search CORE

333 research outputs found

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Author: Church
D. R. Maglott
Dalgleish
G. R. Brown
K. D. Pruitt
Petersen
Prakash
Pruitt
Sherry
T. Tatusova
Publication venue: Oxford University Press
Publication date
Field of study

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/)

Crossref

PubMed Central

eGenomics: Cataloguing our complete genome collection III

Author: Cochrane G.
Field D.
Garrity G.
Glöckner F.
Gray T.
Kottmann R.
Lister A.
Selengut J.
Sterk P.
Tateno Y.
Tatusova T.
Thomson N.
Vaughan R.
Publication venue
Publication date: 30/04/2007
Field of study

This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes

MPG.PuRe

NCBI Reference Sequences: current status, policy and new initiatives

Author: Altschul
Altschul
D. R. Maglott
Eddy
Griffiths-Jones
Gulley
K. D. Pruitt
Lowe
Maquat
Schuler
T. Tatusova
W. Klimke
Publication venue: Oxford University Press
Publication date
Field of study

NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system

Crossref

PubMed Central

Database resources of the National Center for Biotechnology Information

Author: A. Souvorov
Altschul
Altschul
Blumenfeld
D. A. Benson
D. J. Lipman
D. L. Wheeler
D. Landsman
D. M. Church
D. R. Maglott
E. Sequeira
E. Yaschenko
Ermolaeva
Fung
G. D. Schuler
G. Starchenko
Geer
Ghedin
J. Ostell
K. Canese
K. D. Pruitt
K. Sirotkin
L. Wagner
L. Y. Geer
M. DiCuccio
M. Feolo
M. Shumway
Ma
Manolio
Needleman
O. Khovayko
R. Edgar
R. L. Tatusov
S. Federhen
S. H. Bryant
S. T. Sherry
Schuler
Schuler
Sewell
Sherry
T. A. Tatusova
T. Barrett
T. L. Madden
Tatusov
Tatusova
Tatusova
V. Chetvernin
V. Miller
W. Helmberg
Wang
Y. Kapustin
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

In addition to maintaining the GenBank(®) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at

Crossref

PubMed Central

Database resources of the National Center for Biotechnology Information

Author: A. Souvorov
Altschul
Altschul
Blumenfeld
Brazma
D. A. Benson
D. J. Lipman
D. Landsman
D. M. Church
D. R. Maglott
E. Sequeira
E. W. Sayers
E. Yaschenko
Ermolaeva
Finn
G. D. Schuler
G. Starchenko
Geer
Geschwind
Gibrat
Heintz
Helmberg
I. Mizrachi
J. Ostell
J. Ye
K. Canese
K. D. Pruitt
K. Sirotkin
Kapustin
L. Wagner
L. Y. Geer
Lenffer
Letunic
M. DiCuccio
M. Feolo
M. Shumway
Madej
Manolio
Needleman
R. Edgar
S. Federhen
S. H. Bryant
S. T. Sherry
Schuler
Schuler
Sewell
Sherry
Sprague
T. A. Tatusova
T. Barrett
T. L. Madden
Tatusov
Tatusova
Tatusova
V. Chetvernin
V. Miller
W. Helmberg
Wang
Y. Kapustin
Ye
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

CiteSeerX

Crossref

PubMed Central

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata

Author: Church
E. Gribov
E. Yaschenko
I. Karsch-Mizrachi
J. Ostell
Jung
K. Clark
K. D. Pruitt
M. Kimelman
Mailman
R. Gevorgyan
Rasko
S. Resenchuk
Schuler
Sherry
T. Barrett
T. Tatusova
V. Gorelenkov
Yilmaz
Publication venue: Oxford University Press
Publication date
Field of study

As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases. It captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. Concomitantly, the BioSample database is being developed to capture descriptive information about the biological samples investigated in projects. BioProject and BioSample records link to corresponding data stored in archival repositories. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their projects and samples. Together, these databases offer improved ways for users to query, locate, integrate and interpret the masses of data held in NCBI's archival repositories. The BioProject and BioSample databases are available at http://www.ncbi.nlm.nih.gov/bioproject and http://www.ncbi.nlm.nih.gov/biosample, respectively

Crossref

PubMed Central

Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

Author: Antonio Baltazar A.
Aono Hideo
Apweiler Rolf
Barrero Roberto A.
Bruskiewich Richard
Bureau Thomas
Burr Benjamin
Burr Frances
Costa de Oliveira Antonio
Fujii Yasuyuki
Fuks Galina
Gojobori Takashi
Habara Takuya
Haberer Georg
Han Bin
Harada Erimi
Higo Kenichi
Hilton Phillip B.
Hiraki Aiko T.
Hirochika Hirohiko
Hoen Douglas
Hokari Hiroki
Hosokawa Satomi
Hsing Yue
Ikawa Hiroshi
Ikeo Kazuho
Imanishi Tadashi
Ito Yukiyo
Itoh Takeshi
Jaiswal Pankaj
Kanno Masako
Kawahara Yosihiro
Kawamura Toshiyuki
Kawashima Hiroaki
Khurana Jitendra P.
Kikuchi Shoshi
Komatsu Setsuko
Koyanagi Kanako O.
Kubooka Hiromi
Liberherr Damien
Lin Yao-Cheng
Lonsdale David
Matsumoto Takashi
Matsuya Akihiro
McCombie W. Richard
Messing Joachim
Miyao Akio
Mulder Nicola
Nagamura Yoshiaki
Nam Jongmin
Namiki Nobukazu
Numa Hisataka
Nurimoto Shin
O'Donovan Claire
Ohyanagi Hajimi
Okido Toshihisa
OOta Satoshi
Osato Naoki
Palmer Lance E.
Quetier Francis
Raghuvanshi Surabh
Saichi Naomi
Sakai Hiroaki
Sakai Yasumichi
Sakata Katsumi
Sakurai Tetsuya
Saski Takuji
Sato Fumihiko
Sato Yoshiharu
Schoof Heiko
Seki Motoaki
Shibata Katsumi
Shibata Michie
Shimizu Yuji
Shinozaki Kazuo
Shinso Yuji
Singh Nagendra K.
Smith-White Brian
Takeda Jun-ichi
Tanaka Tsuyoshi
Tanino Motohiko
Tatusova Tatiana
Thongjuea Supat
Todokoro Fusano
Tsugane Mika
Tyagi Akhilesh K.
Vanavichit Apichart
Wang Aihui
Wing Rod A.
Yamaguchi Kaori
Yamamoto Mayu
Yamamoto Naoyuki
Yamasaki Chisato
Yu Yeisoo
Zhang Hao
Zhao Qiang
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2007
Field of study

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Caltech Authors

University of Queensland eSpace

Managing ethnic conflict : the menu of institutional engineering

Author: A Carattoli
A Mira
AM Altenhoff
B Contreras-Moreira
B Contreras-Moreira
B Sachman-Ruiz
C Camacho
DM Kristensen
DM Kristensen
EL Sonnhammer
EV Koonin
H Tettelin
H Tettelin
H Willenbrock
I Pagani
K Forslund
L Li
L Poirel
L Poirel
L Snipen
P Nordmann
P Vinuesa
RA Welch
RC Edgar
RC Moellering Jr
RD Finn
RL Tatusov
RS Kaas
S Guindon
SR Eddy
T Sekizuka
T Tatusova
TJ Johnson
WF Fricke
YI Wolf
Publication venue: Wissenschaftliche Einrichtungen. GIGA - German Institute of Global and Area Studies
Publication date: 01/01/2011
Field of study

The debate on institutional engineering offers options to manage ethnic and other conflicts. This contribution systematically assesses the logic of these institutional designs and the empirical evidence on their functioning. Generally, institutions can work on ethnic conflict by either accommodating (“consociationalists”) or denying (“integrationists”) ethnicity in politics. Looking at individual and combined institutions (e.g. state structure, electoral system, forms of government), the literature review finds that most designs are theoretically ambivalent and that empirical evidence on their effectiveness is mostly inconclusive. The following questions remain open: a) Is politicized ethnicity really a conflict risk? b) What impact does the whole “menu” (not just single institutions) have? and c) How are effects conditioned by the exact nature of conflict risks

Crossref

eDoc.VifaPol

Digital.CSIC

Molecular diversity of phospholipase D in angiosperms

Author: A Bateman
AE Nalefski
AGI
AM Laxalt
BD Whitaker
C Burge
C Wang
CP Ponting
DJ Hanahan
F Cvrckova
GD Schuler
GF Barry
J Felsenstein
J Rizo
J Schultz
J Ueki
JC Gardiner
JD Thompson
JH Dyer
K Pappan
K Pappan
KC Almquist
KD Chapman
L Fan
L Zheng
M den Hartog
M Liscovitch
MN Hodgkin
RM Ewing
S Morioka
SF Altschul
SF Altschul
T Katagiri
T Munnik
T Munnik
T Munnik
TA Tatusova
VA Sciorra
W Frank
W Lein
W Qin
W Qin
X Wang
X Wang
X Wang
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: The phospholipase D (PLD) family has been identified in plants by recent molecular studies, fostered by the emerging importance of plant PLDs in stress physiology and signal transduction. However, the presence of multiple isoforms limits the power of conventional biochemical and pharmacological approaches, and calls for a wider application of genetic methodology. RESULTS: Taking advantage of sequence data available in public databases, we attempted to provide a prerequisite for such an approach. We made a complete inventory of the Arabidopsis thaliana PLD family, which was found to comprise 12 distinct genes. The current nomenclature of Arabidopsis PLDs was refined and expanded to include five newly described genes. To assess the degree of plant PLD diversity beyond Arabidopsis we explored data from rice (including the genome draft by Monsanto) as well as cDNA and EST sequences from several other plants. Our analysis revealed two major PLD subfamilies in plants. The first, designated C2-PLD, is characterised by presence of the C2 domain and comprises previously known plant PLDs as well as new isoforms with possibly unusual features-catalytically inactive or independent on Ca(2+). The second subfamily (denoted PXPH-PLD) is novel in plants but is related to animal and fungal enzymes possessing the PX and PH domains. CONCLUSIONS: The evolutionary dynamics, and inter-specific diversity, of plant PLDs inferred from our phylogenetic analysis, call for more plant species to be employed in PLD research. This will enable us to obtain generally valid conclusions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Whole-Genome Sequencing of Sake Yeast Saccharomyces cerevisiae Kyokai no. 7

Author: A. Hosoyama
A. Nakazato
A. Nishimura
A. Ohta
Aa
Akada
Azumi
Badger
Borneman
Carreto
D. Watanabe
Day
de Smidt
Diogo
Dunn
Ewing
Ewing
Fay
Florea
Fogel
Goffeau
Goto
H. Horikawa
H. Kitagaki
H. Mizoguchi
H. Shimoi
H. Tsuboi
Hereford
I. Yashiro
Jeong
K. Iwashita
K. Kitamoto
K. Yoda
Katou
Katou
Kellis
Kim
Kitagaki
Kostriken
Kruckeberg
Kurtz
Legras
Liti
M. Namise
M. Sato
Meiron
N. Fujita
N. Kitamoto
O. Kobayashi
Ozcan
Perez
Platt
R. Akada
Rogers
S. Harashima
S. Kajiwara
S. Kuhara
S. Shibasaki
S. Tanimoto
Schacherer
Schacherer
Sharon
Shimoi
Steger
T. Akao
T. Inoue
T. Ishikawa
T. Masubuchi
T. Oba
T. Ogata
Tamura
Tatusova
Wieland
Wu
Wu
Y. Ando
Y. Inoue
Y. Nakao
Y. Takatsume
Yamada
Publication venue: Oxford University Press
Publication date: 01/12/2011
Field of study

The term ‘sake yeast’ is generally used to indicate the Saccharomyces cerevisiae strains that possess characteristics distinct from others including the laboratory strain S288C and are well suited for sake brewery. Here, we report the draft whole-genome shotgun sequence of a commonly used diploid sake yeast strain, Kyokai no. 7 (K7). The assembled sequence of K7 was nearly identical to that of the S288C, except for several subtelomeric polymorphisms and two large inversions in K7. A survey of heterozygous bases between the homologous chromosomes revealed the presence of mosaic-like uneven distribution of heterozygosity in K7. The distribution patterns appeared to have resulted from repeated losses of heterozygosity in the ancestral lineage of K7. Analysis of genes revealed the presence of both K7-acquired and K7-lost genes, in addition to numerous others with segmentations and terminal discrepancies in comparison with those of S288C. The distribution of Ty element also largely differed in the two strains. Interestingly, two regions in chromosomes I and VII of S288C have apparently been replaced by Ty elements in K7. Sequence comparisons suggest that these gene conversions were caused by cDNA-mediated recombination of Ty elements. The present study advances our understanding of the functional and evolutionary genomics of the sake yeast

Crossref

Yamaguchi University Navigator for Open access Collection and Archives

PubMed Central