Search CORE

83 research outputs found

UniProtKB amid the turmoil of plant proteomics research

Author: Michel Schneider
Sylvain Poux
the UniProt Consortium
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2012
Field of study

The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced – 1001 Arabidopsis cultivars are currently under way – and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource

Frontiers - Publisher Connector

PubMed Central

Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation.

Author: Alan Bridge
Anne Estreicher
Arnaud Gos
Ioannis Xenarios
Jerven Bolleman
Lionel Breuza
Lydie Bougueleret
Maria Livia Famiglietti
Nicole Redaschi
null null
Sylvain Poux
Sébastien Géhant
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype

Crossref

Serveur académique lausannois

PubMed Central

HAMAP in 2015: updates to the protein family classification and annotation system

Author: Auchincloss Andrea H.
Baratin Delphine
Bougueleret Lydie
Bridge Alan
Coudert Elisabeth
Cuche Béatrice A.
deCastro Edouard
Keller Guillaume
Pedruzzi Ivo
Poux Sylvain
Redaschi Nicole
Rivoire Catherine
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorith

RERO DOC Digital Library

HAMAP in 2013, new developments in the protein family classification and annotation system

Author: Auchincloss Andrea H.
Baratin Delphine
Bougueleret Lydie
Bridge Alan
Coudert Elisabeth
Cuche Béatrice A.
de Castro Edouard
Keller Guillaume
Pedruzzi Ivo
Poux Sylvain
Redaschi Nicole
Rivoire Catherine
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profile

RERO DOC Digital Library

Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns.

Author: Attrill Helen
Carbon Seth
Engel Stacia R
Feuermann Marc
Gaudet Pascale
Harris Midori A
Hill David P
Lock Antonia
Lovering Ruth C
Mungall Christopher J
Poux Sylvain
Rutherford Kim M
Van Auken Kimberly
Wood Valerie
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/09/2020
Field of study

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa

The Jackson Laboratory: The Mouseion at the JAXlibrary

Annotation of gene product function from high-throughput studies using the Gene Ontology.

Author: Attrill Helen
Berardini Tanya Z
Chibucos Marcus C
Drabkin Harold
Engel Stacia R
Fey Petra
Garmiri Penelope
Gaudet Pascale
Gene Ontology Consortium
Georghiou George
Harris Midori A
Huntley Rachael P
Lovering Ruth C
Poux Sylvain
Reiser Leonore
Sawford Tony
Tauber Rebecca
Toro Sabrina
Van Auken Kimberly M
Wood Valerie
Publication venue: Database (Oxford)
Publication date: 01/01/2019
Field of study

High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community

The Jackson Laboratory: The Mouseion at the JAXlibrary

Caltech Authors

Apollo (Cambridge)

Annotation of gene product function from high-throughput studies using the Gene Ontology

Author: Attrill Helen
Berardini Tanya Z.
Chibucos Marcus C.
Drabkin Harold
Engel Stacia R.
Fey Petra
Garmiri Penelope
Gaudet Pascale
Georghiou George
Harris Midori A.
Huntley Rachael P.
Lovering Ruth C.
Poux Sylvain
Reiser Leonore
Sawford Tony
Tauber Rebecca
Toro Sabrina
Van Auken Kimberly M.
Wood Valerie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2019
Field of study

An integrated ontology resource to explore and study host-virus relationships.

Author: A Castelló
A Lee
A Wack
AF Fusaro
AJ Hume
Alan Bridge
BT Seet
C Lee
C Vogt
C Wei
C-G Duan
Chantal Hulo
CZ Song
D Horst
DE Levy
E Dixit
Edouard de Castro
EW Birch
G Singh
GA Parker
I Akhrymuk
Ioannis Xenarios
J Shi
Jane Lomax
JB Johnston
JE Oh
K Brennan
KM Rose
KN Bishop
L Balvay
Lydie Bougueleret
M Aranda
M-C Geoffroy
ME Penfold
Michelle L. Baker
MJ De Veer
MR Thompson
MU Gack
P Jugovic
PAM Gobeil
Patrick Masson
Philippe Le Mercier
R Nascimento
RD Everett
Rebecca Foulger
S Benhenda
S De Breyne
Sylvain Poux
T Csorba
W Fu
X Dong
X-D Li
Y Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Our growing knowledge of viruses reveals how these pathogens manage to evade innate host defenses. A global scheme emerges in which many viruses usurp key cellular defense mechanisms and often inhibit the same components of antiviral signaling. To accurately describe these processes, we have generated a comprehensive dictionary for eukaryotic host-virus interactions. This controlled vocabulary has been detailed in 57 ViralZone resource web pages which contain a global description of all molecular processes. In order to annotate viral gene products with this vocabulary, an ontology has been built in a hierarchy of UniProt Knowledgebase (UniProtKB) keyword terms and corresponding Gene Ontology (GO) terms have been developed in parallel. The results are 65 UniProtKB keywords related to 57 GO terms, which have been used in 14,390 manual annotations; 908,723 automatic annotations and propagated to an estimation of 922,941 GO annotations. ViralZone pages, UniProtKB keywords and GO terms provide complementary tools to users, and the three resources have been linked to each other through host-virus vocabulary

CiteSeerX

Crossref

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

The UniProt-GO Annotation database in 2011

Author: Alam-Faruque Yasmin
Apweiler Rolf
Argoud-Puy Ghislaine
Auchincloss Andrea
Axelsen Kristian
Bely Benoit
Blatter Marie-Claude
Bougueleret Lydie
Boutet Emmanuel
Braconi-Quintaje Silvia
Breuza Lionel
Bridge Alan
Browne Paul
Coudert Elizabeth
Cusin Isabelle
Dimmer Emily C.
Duek- Roggli Paula
Eberhardt Ruth
Estreicher Anne
Famiglietti Livia
Ferro-Rojas Serenella
Feuermann Marc
Gardner Michael
Gos Arnaud
Gruaz-Gumowski Nadine
Hinz Ursula
Hulo Chantal
Huntley Rachael P.
James Janet
Jimenez Silvia
Jungo Florence
Keller Guillaume
Laiho Kati
Legge Duncan
Lemercier Phillippe
Lieberherr Damien
Magrane Michele
Martin Maria J.
Masson Patrick
Moinat Madelaine
Mun Chan Wei
O'Donovan Claire
Pedruzzi Ivo
Pichler Klemens
Poggioli Diego
Poux Sylvain
Rivoire Catherine
Roechert Bernd
Sawford Tony
Schneider Michael
Sehra Harminder
Stutz Andre
Sundaram Shyamala
Tognolli Michael
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360 000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data se

RERO DOC Digital Library