1,093 research outputs found

    UniProtKB amid the turmoil of plant proteomics research

    Get PDF
    The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced – 1001 Arabidopsis cultivars are currently under way – and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource

    Manual Curation of Vertebrate Proteins in the UniProt Knowledgebase.

    Get PDF
    The UniProt Knowledgebase (UniProtKB) aims to provide the scientific community with a comprehensive, consistent and authoritative resource for protein sequence and functional information. Given the importance of human and vertebrate model data in biomedical research, a major focus is the high-quality manual curation of human proteins and their vertebrate orthologues. Manual curation involves (1) the extraction of experimental results from scientific literature to enrich protein records with a wide range of information including function, structure, interactions and subcellular location, (2) the manual verification of each sequence and clarification of discrepancies between sequence reports, and (3) the assessment of the output of a range of analysis programmes to ensure that sequence features are correctly reported. Manual curation also facilitates the standardization of experimental data – a step necessary for development of methods that enable the semi-automated transfer of manual annotation to uncharacterised or related proteins. Consequently, manual curation of vertebrate proteins plays a vital role in providing users with a complete overview of available data while ensuring its accuracy, reliability and accessibility. UniProtKB/Swiss-Prot currently contains the complete manually reviewed human proteome, comprising approximately 20’300 proteins, and an additional 61’000 reviewed entries from model vertebrates such as mouse, rat, apes, cow, chicken, zebrafish and Xenopus. Ongoing efforts continue to improve the quality of vertebrate sequences in collaboration with HAVANA, Ensembl, HGNC and RefSeq, to include new functional information as it becomes available, and to extend the coverage of curated proteins in vertebrate species. All data are freely available from "http://www.uniprot.org":www.uniprot.org

    Experimental data from flesh quality assessment and shelf life monitoring of high pressure processed European sea bass (Dicentrarchus labrax) fillets

    Get PDF
    Fresh fish are highly perishable food products and their short shelf-life limits their commercial exploitation and leads to waste, which has a negative impact on aquaculture sustainability. New non-thermal food processing methods, such as high pressure (HP) processing, prolong shelf-life while assuring high food quality. The effect of HP processing (600MPa, 25 °C, 5min) on European sea bass (Dicentrarchus labrax) fillet quality and shelf life was investigated. The data presented comprises microbiome and proteome profiles of control and HP-processed sea bass fillets from 1 to 67 days of isothermal storage at 2 °C. Bacterial diversity was analysed by Illumina high-throughput sequencing of the 16S rRNA gene in pooled DNAs from control or HP-processed fillets after 1, 11 or 67 days and the raw reads were deposited in the NCBI-SRA database with accession number PRJNA517618. Yeast and fungi diversity were analysed by high-throughput sequencing of the internal transcribed spacer (ITS) region for control and HP-processed fillets at the end of storage (11 or 67 days, respectively) and have the SRA accession number PRJNA517779. Quantitative label-free proteomics profiles were analysed by SWATH-MS (Sequential Windowed data independent Acquisition of the Total High-resolution-Mass Spectra) in myofibrillar or sarcoplasmic enriched protein extracts pooled for control or HP-processed fillets after 1, 11 and 67 days of storage. Proteome data was deposited in the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD012737. These data support the findings reported in the associated manuscript "High pressure processing of European sea bass (Dicentrarchus labrax) fillets and tools for flesh quality and shelf life monitoring", Tsironi et al., 2019, JFE 262:83-91, doi.org/10.1016/j.jfoodeng.2019.05.010.FCT (Foundation of Science and Technology) COFASP/0002/2015; Portuguese Foundation for Science and Technology UID/Multi/04326/2019 POCI-01-0145-FEDER007440 UID/NEU/04539/2019info:eu-repo/semantics/publishedVersio

    Gene3D: Extensive prediction of globular domains in proteins

    Get PDF
    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family

    GONUTS: the Gene Ontology Normal Usage Tracking System

    Get PDF
    The Gene Ontology Normal Usage Tracking System (GONUTS) is a community-based browser and usage guide for Gene Ontology (GO) terms and a community system for general GO annotation of proteins. GONUTS uses wiki technology to allow registered users to share and edit notes on the use of each term in GO, and to contribute annotations for specific genes of interest. By providing a site for generation of third-party documentation at the granularity of individual terms, GONUTS complements the official documentation of the Gene Ontology Consortium. To provide examples for community users, GONUTS displays the complete GO annotations from seven model organisms: Saccharomyces cerevisiae, Dictyostelium discoideum, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus and Arabidopsis thaliana. To support community annotation, GONUTS allows automated creation of gene pages for gene products in UniProt. GONUTS will improve the consistency of annotation efforts across genome projects, and should be useful in training new annotators and consumers in the production of GO annotations and the use of GO terms. GONUTS can be accessed at http://gowiki.tamu.edu. The source code for generating the content of GONUTS is available upon request

    BioMart Central Portal—unified access to biological data

    Get PDF
    BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access a wide array of biological databases. These include major biomolecular sequence, pathway and annotation databases such as Ensembl, Uniprot, Reactome, HGNC, Wormbase and PRIDE; for a complete list, visit, http://www.biomart.org/biomart/martview. Moreover, the web server features seamless data federation making cross querying of these data sources in a user friendly and unified way. The web server not only provides access through a web interface (MartView), it also supports programmatic access through a Perl API as well as RESTful and SOAP oriented web services. The website is free and open to all users and there is no login requirement

    Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites.

    Get PDF
    M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site

    REBASE—a database for DNA restriction and modification: enzymes, genes and genomes

    Get PDF
    REBASE is a comprehensive database of information about restriction enzymes, DNA methyltransferases and related proteins involved in the biological process of restriction–modification (R–M). It contains fully referenced information about recognition and cleavage sites, isoschizomers, neoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. Experimentally characterized homing endonucleases are also included. The fastest growing segment of REBASE contains the putative R–M systems found in the sequence databases. Comprehensive descriptions of the R–M content of all fully sequenced genomes are available including summary schematics. The contents of REBASE may be browsed from the web (http://rebase.neb.com) and selected compilations can be downloaded by ftp (ftp.neb.com). Additionally, monthly updates can be requested via email
    corecore