398 research outputs found

    Protein Ontology: A controlled structured network of protein entities

    Get PDF
    The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO’s organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO’s representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments

    An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.

    Get PDF
    Published literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate. In addition, biomedical document classification often amounts to identifying a small subset of relevant publications within a much larger collection of available documents. As such, addressing class imbalance is essential to a practical classifier. We present here an effective classification scheme for automatically identifying papers among a large pool of biomedical publications that contain information relevant to a specific topic, which the curators are interested in annotating. The proposed scheme is based on a meta-classification framework using cluster-based under-sampling combined with named-entity recognition and statistical feature selection strategies. We examined the performance of our method over a large imbalanced data set that was originally manually curated by the Jackson Laboratory\u27s Gene Expression Database (GXD). The set consists of more than 90 000 PubMed abstracts, of which about 13 000 documents are labeled as relevant to GXD while the others are not relevant. Our results, 0.72 precision, 0.80 recall and 0.75 f-measure, demonstrate that our proposed classification scheme effectively categorizes such a large data set in the face of data imbalance

    Recent advances in biocuration: Meeting report from the Fifth International Biocuration Conference

    Get PDF
    The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration’s (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISB\u27s goal to support exchanges among members of the biocuration community. Next year\u27s conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the society\u27s activities (http://biocurator.org), as well as related events of interest

    TGF-beta signaling proteins and the Protein Ontology

    Get PDF
    BACKGROUND: The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO. RESULTS: PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications. CONCLUSION: PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses

    The brain is hypothermic in patients with mitochondrial diseases

    Get PDF
    We sought to study brain temperature in patients with mitochondrial diseases in different functional states compared with healthy participants. Brain temperature and mitochondrial function were monitored in the visual cortex and the centrum semiovale at rest and during and after visual stimulation in seven individuals with mitochondrial diseases (n=5 with mitochondrial DNA mutations and n=2 with nuclear DNA mutations) and in 14 age- and sex-matched healthy control participants using a combined approach of visual stimulation, proton magnetic resonance spectroscopy (MRS), and phosphorus MRS. Brain temperature in control participants exhibited small changes during visual stimulation and a consistent increase, together with an increase in high-energy phosphate content, after visual stimulation. Brain temperature was persistently lower in individuals with mitochondrial diseases than in healthy participants at rest, during activation, and during recovery, without significant changes from one state to another and with a decrease in the high-energy phosphate content. The lowest brain temperature was observed in the patient with the most deranged mitochondrial function. In patients with mitochondrial diseases, the brain is hypothermic because of malfunctioning oxidative phosphorylation. Neuronal activity is reduced at rest, during physiologic brain stimulation, and after stimulation. \ua9 2014 ISCBFM

    Protein Ontology: Enhancing and scaling up the representation of protein entities

    Get PDF
    The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities

    The Protein Ontology: a structured representation of protein forms and complexes

    Get PDF
    The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry

    Profiling of ubiquitination pathway genes in peripheral cells from patients with frontotemporal dementia due to C9ORF72 and GRN mutations

    Get PDF
    We analysed the expression levels of 84 key genes involved in the regulated degradation of cellular protein by the ubiquitin-proteasome system in peripheral cells from patients with frontotemporal dementia (FTD) due to C9ORF72 and GRN mutations, as compared with sporadic FTD and age-matched controls. A SABiosciences PCR array was used to investigate the transcription profile in a discovery population consisting of six patients each in C9ORF72, GRN, sporadic FTD and age-matched control groups. A generalized down-regulation of gene expression compared with controls was observed in C9ORF72 expansion carriers and sporadic FTD patients. In particular, in both groups, four genes, UBE2I, UBE2Q1, UBE2E1 and UBE2N, were down-regulated at a statistically significant (p < 0.05) level. All of them encode for members of the E2 ubiquitin-conjugating enzyme family. In GRN mutation carriers, no statistically significant deregulation of ubiquitination pathway genes was observed, except for the UBE2Z gene, which displays E2 ubiquitin conjugating enzyme activity, and was found to be statistically significant up-regulated (p = 0.006). These preliminary results suggest that the proteasomal degradation pathway plays a role in the pathogenesis of FTD associated with TDP-43 pathology, although different proteins are altered in carriers of GRN mutations as compared with carriers of the C9ORF72 expansion

    Text mining for the biocuration workflow

    Get PDF
    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community
    corecore