Search CORE

10 research outputs found

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

Author: Ahrén Dag
Darzentas Nikos
Goldovsky Leon
Kaipa Pallavi
Karp Peter D.
Kunin Victor
López-Bigas Núria
Moore-Kochlacs Caroline
Ouzounis Christos A.
Tsoka Sophia
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing

CiteSeerX

Lund University Publications

PubMed Central

King's Research Portal

Matching curated genome databases: a non trivial task

Author: Barba Matthieu
Descorps-Declère Stéphane
Labedan Bernard
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. Results Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome. CorBank is freely accessible at <url>http://www.corbank.u-psud.fr</url>. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. Conclusion CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

How to Kill Copyright: A Brute-Force Approach to Content Creation

Author: Sigmon Kirk
Publication venue: Scholarship@Cornell Law: A Digital Repository
Publication date: 01/05/2013
Field of study

How to Kill Copyright: A Brute-Force Approach to Content Creation

Author: Sigmon Kirk
Publication venue: Scholarship@Cornell Law: A Digital Repository
Publication date: 01/05/2013
Field of study

bepress Legal Repository

Scholarship @ Cornell Law

Data mining and data integration in biology

Author: Ólason Páll Ísólfur
Publication venue
Publication date: 01/03/2008
Field of study

Online Research Database In Technology

Human protein-protein interaction prediction

Author: McDowall Mark
Publication venue
Publication date: 01/01/2011
Field of study

University of Dundee Online Publications

MagicMatch - cross-referencing sequence identifiers across databases

Author: Enright A J
Goldovsky L
Kunin V
Ouzounis C A
Smith M
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/08/2005
Field of study

King's Research Portal

MagicMatch--cross-referencing sequence identifiers across databases

Author: A. J. Enright
Bairoch
C. A. Ouzounis
Etzold
Holm
Janssen
L. Goldovsky
Li
M. Smith
Pearson
V. Kunin
Wheeler
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Functional classification of protein domain superfamilies for protein function annotation

Author: Das S
Publication venue: UCL (University College London)
Publication date: 28/10/2016
Field of study

Proteins are made up of domains that are generally considered to be independent evolutionary and structural units having distinct functional properties. It is now well established that analysis of domains in proteins provides an effective approach to understand protein function using a `domain grammar'. Towards this end, evolutionarily-related protein domains have been classified into homologous superfamilies in CATH and SCOP databases. An ideal functional sub-classification of the domain superfamilies into `functional families' can not only help in function annotation of uncharacterised sequences but also provide a useful framework for understanding the diversity and evolution of function at the domain level. This work describes the development of a new protocol (FunFHMMer) for identifying functional families in CATH superfamilies that makes use of sequence patterns only and hence, is unaffected by the incompleteness of function annotations, annotation biases or misannotations existing in the databases. The resulting family classification was validated using known functional information and was found to generate more functionally coherent families than other domain-based protein resources. A protein function prediction pipeline was developed exploiting the functional annotations provided by the domain families which was validated by a database rollback benchmark set of proteins and an independent assessment by CAFA 2. The functional classification was found to capture the functional diversity of superfamilies well in terms of sequence, structure and the protein-context. This aided studies on evolution of protein domain function both at the superfamily level and in specific proteins of interest. The conserved positions in the functional family alignments were found to be enriched in catalytic site residues and ligand-binding site residues which led to the development of a functional site prediction tool. Lastly, the function prediction tools were assessed for annotation of moonlighting functions of proteins and a classification of moonlighting proteins was proposed based on their structure-function relationships

UCL Discovery

Estudio comparativo de la regulación transcripcional en procesos de biodegradación

Author: Carbajosa Antona Guillermo
Publication venue
Publication date: 01/01/2009
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 16-02-200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo