86 research outputs found

    Framework for a Protein Ontology

    Get PDF
    Biomedical ontologies are emerging as critical tools in genomic and proteomic research, where complex data in disparate resources need to be integrated. A number of ontologies describe properties that can be attributed to proteins. For example, protein functions are described by the Gene Ontology (GO) and human diseases by SNOMED CT or ICD10. There is, however, a gap in the current set of ontologies – one that describes the protein entities themselves and their relationships. We have designed the PRotein Ontology (PRO) to facilitate protein annotation and to guide new experiments. The components of PRO extend from the classification of proteins on the basis of evolutionary relationships to the representation of the multiple protein forms of a gene (products generated by genetic variation, alternative splicing, proteolytic cleavage, and other post-translational modifications). PRO will allow the specification of relationships between PRO, GO and other ontologies in the OBO Foundry. Here we describe the initial development of PRO, illustrated using human and mouse proteins involved in the transforming growth factor-beta and bone morphogenetic protein signaling pathways

    Provenance, propagation and quality of biological annotation

    Get PDF
    PhD ThesisBiological databases have become an integral part of the life sciences, being used to store, organise and share ever-increasing quantities and types of data. Biological databases are typically centred around raw data, with individual entries being assigned to a single piece of biological data, such as a DNA sequence. Although essential, a reader can obtain little information from the raw data alone. Therefore, many databases aim to supplement their entries with annotation, allowing the current knowledge about the underlying data to be conveyed to a reader. Although annotations come in many di erent forms, most databases provide some form of free text annotation. Given that annotations can form the foundations of future work, it is important that a user is able to evaluate the quality and correctness of an annotation. However, this is rarely straightforward. The amount of annotation, and the way in which it is curated, varies between databases. For example, the production of an annotation in some databases is entirely automated, without any manual intervention. Further, sections of annotations may be reused, being propagated between entries and, potentially, external databases. This provenance and curation information is not always apparent to a user. The work described within this thesis explores issues relating to biological annotation quality. While the most valuable annotation is often contained within free text, its lack of structure makes it hard to assess. Initially, this work describes a generic approach that allows textual annotations to be quantitatively measured. This approach is based upon the application of Zipf's Law to words within textual annotation, resulting in a single value, . The relationship between the value and Zipf's principle of least e ort provides an indication as to the annotations quality, whilst also allowing annotations to be quantitatively compared. Secondly, the thesis focuses on determining annotation provenance and tracking any subsequent propagation. This is achieved through the development of a visualisation - i - framework, which exploits the reuse of sentences within annotations. Utilising this framework a number of propagation patterns were identi ed, which on analysis appear to indicate low quality and erroneous annotation. Together, these approaches increase our understanding in the textual characteristics of biological annotation, and suggests that this understanding can be used to increase the overall quality of these resources

    Exploring the potential of public proteomics data

    Get PDF
    In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS‐based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re‐)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.publishedVersio

    Exploring the potential of public proteomics data

    Get PDF
    In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data

    Consequences of refining biological networks through detailed pathway information : From genes to proteoforms

    Get PDF
    Biologiske nettverk kan brukes til Ä modellere molekylÊre prosesser, forstÄ sykdomsprogresjon og finne nye behandlingsstrategier. Denne avhandlingen har undersÞkt hvordan utformingen av slike nettverk pÄvirker deres struktur, og hvordan dette kan benyttes til Ä forbedre spesifisiteten for pÄfÞlgende analyser av slike modeller. Det fÞrste som ble undersÞkt var potensialet ved Ä bruke mer detaljerte molekylÊre data nÄr man modellerer humane biokjemiske reaksjonsnettverk. Resultatene bekrefter at det er nok informasjon om proteoformer, det vil si proteiner i spesifikke post-translasjonelle tilstander, for systematiske analyser og viste ogsÄ store forskjeller i strukturen mellom en gensentrisk og en proteoformsentrisk representasjon. Deretter utviklet vi programmatisk tilgang og sÞk i slike nettverk basert pÄ ulike typer av biomolekyler, samt en generisk algoritme som muliggjÞr fleksibel kartlegging av eksperimentelle data knyttet til den teoretiske representasjonen av proteoformer i referansedatabaser. Til slutt ble det konstruert sÄkalte pathway-spesifikke nettverk ved bruk av ulike detaljnivÄer ved representasjonen av biokjemiske reaksjoner. Her ble informasjon som vanligvis blir oversett i standard nettverksrepresentasjoner inkludert: smÄ molekyler, isoformer og modifikasjoner. Strukturelle egenskaper, som nettverksstÞrrelse, graddistribusjon og tilkobling i bÄde globale og lokale undernettverk, ble deretter analysert for Ä kvantifisere virkningene av endringene.Biological networks can be used to model molecular processes, understand disease progression, and find new treatment strategies. This thesis investigated how refining the design of biological networks influences their structure, and how this can be used to improve the specificity of pathway analyses. First, we investigate the potential to use more detailed molecular data in current human biological pathways. We verified that there are enough proteoform annotations, i.e. information about proteins in specific post-translational states, for systematic analyses and characterized the structure of gene-centric versus proteoform-centric network representations of pathways. Next, we enabled the programmatic search and mining of pathways using different models for biomolecules including proteoforms. We notably designed a generic proteoform matching algorithm enabling the flexible mapping of experimental data to the theoretic representation in reference databases. Finally, we constructed pathway-based networks using different degrees of detail in the representation of biochemical reactions. We included information overlooked in most standard network representations: small molecules, isoforms, and post-translational modifications. Structural properties such as network size, degree distribution, and connectivity in both global and local subnetworks, were analysed to quantify the impact of the added molecular entities.Doktorgradsavhandlin

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome
    • 

    corecore