Search CORE

41 research outputs found

Virus variation resources at the National Center for Biotechnology Information: dengue virus

Author: Bao Yiming
Kiryutin Boris
Resch Wolfgang
Rozanov Michael
Tatusova Tatiana A
Zaslavsky Leonid
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background There is an increasing number of complete and incomplete virus genome sequences available in public databases. This large body of sequence data harbors information about epidemiology, phylogeny, and virulence. Several specialized databases, such as the NCBI Influenza Virus Resource or the Los Alamos HIV database, offer sophisticated query interfaces along with integrated exploratory data analysis tools for individual virus species to facilitate extracting this information. Thus far, there has not been a comprehensive database for dengue virus, a significant public health threat. Results We have created an integrated web resource for dengue virus. The technology developed for the NCBI Influenza Virus Resource has been extended to process non-segmented dengue virus genomes. In order to allow efficient processing of the dengue genome, which is large in comparison with individual influenza segments, we developed an offline pre-alignment procedure which generates a multiple sequence alignment of all dengue sequences. The pre-calculated alignment is then used to rapidly create alignments of sequence subsets in response to user queries. This improvement in technology will also facilitate the incorporation of additional virus species in the future. The set of virus-specific databases at NCBI, which will be referred to as Virus Variation Resources (VVR), allow users to build complex queries against virus-specific databases and then apply exploratory data analysis tools to the results. The metadata is automatically collected where possible, and extended with data extracted from the literature. Conclusion The NCBI Dengue Virus Resource integrates dengue sequence information with relevant metadata (sample collection time and location, disease severity, serotype, sequenced genome region) and facilitates retrieval and preliminary analysis of dengue sequences using integrated web analysis and visualization tools.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The relationship of protein conservation and sequence length

Author: Koonin Eugene V
Lipman David J
Panchenko Anna R
Souvorov Alexander
Tatusova Tatiana A
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. RESULTS: We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. CONCLUSIONS: There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints

CiteSeerX

Directory of Open Access Journals

PubMed Central

Visualization of large influenza virus sequence datasets using adaptively aggregated trees with sampling-based subscale representation

Author: AM MacEachren
AS Fauci
D Beermann
D Bryant
E Ghedin
F Chevenet
G Mather
J Baron
J Felsenstein
J Kramer
J-DPC Fekete
JB Plotkin
JF Dufayard
JPP Lamping
L Zaslavsky
L Zaslavsky
Leonid Zaslavsky
N Amenta
PS Levy
S Weiss
SKND Card
Tatiana A Tatusova
U Rost
Y Bao
YI Wolf
Yiming Bao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Genomic Standards Consortium projects

Author: Amaral-Zettler Linda A.
Caporaso J. Gregory
Cochrane Guy R.
Davies Neil
Dawyndt Peter
De Smet Wim
Field Dawn
Garrity George M.
Gilbert Jack A.
Glockner Frank Oliver
Hirschman Lynette
James Cole R.
Karsch-Mizrachi Ilene
Klenk Hans-Peter
Knight Rob
Kottmann Renzo
Kyrpides Nikos C.
Meyer Folker
Morrison Norman
Robbins Robert J.
San Gil Inigo
Sansone Susanna-Assunta
Schriml Lynn M.
Sterk Peter
Tatusova Tatiana
Ussery David W.
White Owen
Wooley John
Yilmaz Pelin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

© The Author(s), 2014. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Standards in Genomic Sciences 9 (2014): 599-601, doi:10.4056/sigs.5559680.The Genomic Standards Consortium (GSC) is an open-membership community working towards the development, implementation and harmonization of standards in the field of genomics. The mission of the GSC is to improve digital descriptions of genomes, metagenomes and gene marker sequences. The GSC started in late 2005 with the defined task of establishing what is now termed the “Minimum Information about any Sequence” (MIxS) standard [1,2]. As an outgrowth of the activities surrounding the creation and implementation of the MixS standard there are now 18 projects within the GSC [3]. These efforts cover an ever widening range of standardization activities. Given the growth of projects and to promote transparency, participation and adoption the GSC has developed a “GSC Project Description Template”. A complete set of GSC Project Descriptions and the template are available on the GSC website. The GSC has an open policy of participation and continues to welcome new efforts. Any projects that facilitate the standard descriptions and exchange of data are potential candidates for inclusion under the GSC umbrella. Areas that expand the scope of the GSC are encouraged. Through these collective activities we hope to help foster the growth of the ‘bioinformatics standards’ community. For more information on the GSC and its range of projects, please see http://gensc.org/

Woods Hole Open Access Server

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

eScholarship - University of California

MPG.PuRe

NERC Open Research Archive

Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

Author: Antonio Baltazar A.
Aono Hideo
Apweiler Rolf
Barrero Roberto A.
Bruskiewich Richard
Bureau Thomas
Burr Benjamin
Burr Frances
Costa de Oliveira Antonio
Fujii Yasuyuki
Fuks Galina
Gojobori Takashi
Habara Takuya
Haberer Georg
Han Bin
Harada Erimi
Higo Kenichi
Hilton Phillip B.
Hiraki Aiko T.
Hirochika Hirohiko
Hoen Douglas
Hokari Hiroki
Hosokawa Satomi
Hsing Yue
Ikawa Hiroshi
Ikeo Kazuho
Imanishi Tadashi
Ito Yukiyo
Itoh Takeshi
Jaiswal Pankaj
Kanno Masako
Kawahara Yosihiro
Kawamura Toshiyuki
Kawashima Hiroaki
Khurana Jitendra P.
Kikuchi Shoshi
Komatsu Setsuko
Koyanagi Kanako O.
Kubooka Hiromi
Liberherr Damien
Lin Yao-Cheng
Lonsdale David
Matsumoto Takashi
Matsuya Akihiro
McCombie W. Richard
Messing Joachim
Miyao Akio
Mulder Nicola
Nagamura Yoshiaki
Nam Jongmin
Namiki Nobukazu
Numa Hisataka
Nurimoto Shin
O'Donovan Claire
Ohyanagi Hajimi
Okido Toshihisa
OOta Satoshi
Osato Naoki
Palmer Lance E.
Quetier Francis
Raghuvanshi Surabh
Saichi Naomi
Sakai Hiroaki
Sakai Yasumichi
Sakata Katsumi
Sakurai Tetsuya
Saski Takuji
Sato Fumihiko
Sato Yoshiharu
Schoof Heiko
Seki Motoaki
Shibata Katsumi
Shibata Michie
Shimizu Yuji
Shinozaki Kazuo
Shinso Yuji
Singh Nagendra K.
Smith-White Brian
Takeda Jun-ichi
Tanaka Tsuyoshi
Tanino Motohiko
Tatusova Tatiana
Thongjuea Supat
Todokoro Fusano
Tsugane Mika
Tyagi Akhilesh K.
Vanavichit Apichart
Wang Aihui
Wing Rod A.
Yamaguchi Kaori
Yamamoto Mayu
Yamamoto Naoyuki
Yamasaki Chisato
Yu Yeisoo
Zhang Hao
Zhao Qiang
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2007
Field of study

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Caltech Authors

University of Queensland eSpace

Database resources of the National Center for Biotechnology Information

Author: Barrett Tanya
Benson Dennis A.
Bryant Stephen H.
Canese Kathi
Church Deanna M.
DiCuccio Michael
Edgar Ron
Federhen Scott
Helmberg Wolfgang
Kenton David L.
Khovayko Oleg
Lipman David J.
Madden Thomas L.
Maglott Donna R.
Ostell James
Pontius Joan U.
Pruitt Kim D.
Schriml Lynn M.
Schuler Gregory D.
Sequeira Edwin
Sherry Steven T.
Sirotkin Karl
Starchenko Grigory
Suzek Tugba O.
Tatusov Roman
Tatusova Tatiana A.
Wagner Lukas
Wheeler David L.
Yaschenko Eugene
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data retrieval systems and computational resources for the analysis of data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, Entrez Programming Utilities, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov

Crossref

PubMed Central

First draft genome assembly of the Argane tree (Argania spinosa)

Background: The Argane tree (Argania spinosa L. Skeels) is an endemic tree of southwestern Morocco that plays an important socioeconomic and ecologic role for a dense human population in an arid zone. Several studies confirmed the importance of this species as a food and feed source and as a resource for both pharmaceutical and cosmetic compounds. Unfortunately, the argane tree ecosystem is facing significant threats from environmental changes (global warming, over-population) and over-exploitation. Limited research has been conducted, however, on argane tree genetics and genomics, which hinders its conservation and genetic improvement. Methods: Here, we present a draft genome assembly of A. spinosa. A reliable reference genome of A. spinosa was created using a hybrid de novo assembly approach combining short and long sequencing reads. Results: In total, 144 Gb Illumina HiSeq reads and 7.2 Gb PacBio reads were produced and assembled. The final draft genome comprises 75 327 scaffolds totaling 671 Mb with an N50 of 49 916 kb. The draft assembly is close to the genome size estimated by k-mers distribution and covers 89% of complete and 4.3 % of partial Arabidopsis orthologous groups in BUSCO. Conclusion: The A. spinosa genome will be useful for assessing biodiversity leading to efficient conservation of this endangered endemic tree. Furthermore, the genome may enable genome-assisted cultivar breeding, and provide a better understanding of important metabolic pathways and their underlying genes for both cosmetic and pharmacological purposes

Ghent University Academic Bibliography

UPSpace at the University of Pretoria

Towards BioDBcore: a community-defined information specification for biological databases

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological database

RERO DOC Digital Library

Database resources of the National Center for Biotechnology Information

Author: Alexandre Souvorov
Altschul
Altschul
Amberger
Anna Panchenko
Aron Marchler-Bauer
Barrett
Benson
Berman
Blumenfeld
Brazma
Crosby
David J. Lipman
David Landsman
Deanna M. Church
Dennis A. Benson
Donna R. Maglott
Douglas Slotta
Edwin Sequeira
Eppig
Eric W. Sayers
Eugene Yaschenko
Evan Bolton
Finn
Fu
Geer
Geschwind
Ghedin
Gibrat
Gong
Gregory D. Schuler
Grigory Starchenko
Haft
Heintz
Helmberg
Hong
Ilene Mizrachi
James Ostell
Ji
Jian Ye
Kanehisa
Kanehisa
Kanehisa
Kapustin
Karl Sirotkin
Kathi Canese
Keseler
Kim D. Pruitt
Klimke
Knutsen
Lenffer
Letunic
Lewis Y. Geer
Lukas Wagner
Ma
Madej
Maglott
Manolio
Marchler-Bauer
Martin Shumway
Michael DiCuccio
Michael Feolo
Mitelman
Needleman
Pagon
Papadopoulos
Pruitt
Schuler
Schuler
Scott Federhen
Sequeira
Sewell
Sherry
Shumway
Sprague
Stephen H. Bryant
Stephen T. Sherry
Tanya Barrett
Tatiana A. Tatusova
Tatusov
Tatusova
Thomas L. Madden
Tom Madej
Vadim Miller
Vyacheslav Chetvernin
W. John Wilbur
Waggoner
Wang
Wang
Wang
Whetzel
Wolfgang Helmberg
Yanli Wang
Ye
Yuri Kapustin
Zhang
Zhiyong Lu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

CiteSeerX

Crossref

PubMed Central