Search CORE

217 research outputs found

Provenance, propagation and quality of biological annotation

Author: Bell Michael James
Publication venue: Newcastle University
Publication date: 01/01/2014
Field of study

PhD ThesisBiological databases have become an integral part of the life sciences, being used to store, organise and share ever-increasing quantities and types of data. Biological databases are typically centred around raw data, with individual entries being assigned to a single piece of biological data, such as a DNA sequence. Although essential, a reader can obtain little information from the raw data alone. Therefore, many databases aim to supplement their entries with annotation, allowing the current knowledge about the underlying data to be conveyed to a reader. Although annotations come in many di erent forms, most databases provide some form of free text annotation. Given that annotations can form the foundations of future work, it is important that a user is able to evaluate the quality and correctness of an annotation. However, this is rarely straightforward. The amount of annotation, and the way in which it is curated, varies between databases. For example, the production of an annotation in some databases is entirely automated, without any manual intervention. Further, sections of annotations may be reused, being propagated between entries and, potentially, external databases. This provenance and curation information is not always apparent to a user. The work described within this thesis explores issues relating to biological annotation quality. While the most valuable annotation is often contained within free text, its lack of structure makes it hard to assess. Initially, this work describes a generic approach that allows textual annotations to be quantitatively measured. This approach is based upon the application of Zipf's Law to words within textual annotation, resulting in a single value, . The relationship between the value and Zipf's principle of least e ort provides an indication as to the annotations quality, whilst also allowing annotations to be quantitatively compared. Secondly, the thesis focuses on determining annotation provenance and tracking any subsequent propagation. This is achieved through the development of a visualisation - i - framework, which exploits the reuse of sentences within annotations. Utilising this framework a number of propagation patterns were identi ed, which on analysis appear to indicate low quality and erroneous annotation. Together, these approaches increase our understanding in the textual characteristics of biological annotation, and suggests that this understanding can be used to increase the overall quality of these resources

Newcastle University eTheses

Pre-validation of a MALDI MS proteomics-based method for the reliable detection of blood and blood provenance

Author: Clark Tom
Clench Malcolm R.
Cole Laura
Francese Simona
Heaton Cameron
Kennedy Katie
Langenburg Glenn
McColm Richard
Sealey Mark
Sears Vaughn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/10/2020
Field of study

Abstract: The reliable identification of blood, as well as the determination of its origin (human or animal) is of great importance in a forensic investigation. Whilst presumptive tests are rapid and deployed in situ, their very nature requires confirmatory tests to be performed remotely. However, only serological tests can determine blood provenance. The present study improves on a previously devised Matrix Assisted Laser Desorption Ionisation Mass Spectrometry (MALDI MS)—proteomics based method for the reliable detection of blood by enabling the determination of blood provenance. The overall protocol was developed to be more specific than presumptive tests and faster/easier than the gold standard liquid chromatography (LC) MS/MS analysis. This is considered a pre-validation study that has investigated stains and fingermarks made in blood, other biofluids and substances that can elicit a false-positive response to colorimetric or presumptive tests, in a blind fashion. Stains and marks were either untreated or enhanced with a range of presumptive tests. Human and animal blood were correctly discriminated from other biofluids and non-biofluid related matrices; animal species determination was also possible within the system investigated. The procedure is compatible with the prior application of presumptive tests. The refined strategy resulting from iterative improvements through a trial and error study of 56 samples was applied to a final set of 13 blind samples. This final study yielded 12/13 correct identifications with the 13th sample being correctly identified as animal blood but with no species attribution. This body of work will contribute towards the validation of MALDI MS based methods and deployment in violent crimes involving bloodshed

Sheffield Hallam University Research Archive

Resolution of the type material of the Asian elephant, Elephas maximus Linnaeus, 1758 (Proboscidea, Elephantidae)

Author: Adrian M. Lister
Alfred L. Roca
Anna-Marie Roos
Anthea Gentry
Artimo
Barnes
Bo Fernholm
Boeseman
Cappellini
Christian D. Kelstrup
Cox
Cox
D. Tim J. Littlewood
David Cram
Eleftheria Palkopoulou
Engel
Enrico Cappellini
Fausto Barbagli
Ishida
Ishida
Jesper V. Olsen
Kelstrup
Lei
Lister
Love Dalén
Lovén
M. Thomas P. Gilbert
Mick Watson
Opazo
Paolo Agnelli
Rappsilber
Ray
Ray
Roth
Rozen
Sanders
Shoshani
Slatkes
Ulf S. Johansson
Vidya
Vizcaíno
Wisniewski
Yang
Yasuko Ishida
Zhang
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

The understanding of Earth’s biodiversity depends critically on the accurate identification and nomenclature of species. Many species were described centuries ago, and in a surprising number of cases their nomenclature or type material remain unclear or inconsistent. A prime example is provided by Elephas maximus, one of the most iconic and well-known mammalian species, described and named by Linnaeus (1758) and today designating the Asian elephant. We used morphological, ancient DNA (aDNA), and high-throughput ancient proteomic analyses to demonstrate that a widely discussed syntype specimen of E. maximus, a complete foetus preserved in ethanol, is actually an African elephant, genus Loxodonta. We further discovered that an additional E. maximus syntype, mentioned in a description by John Ray (1693) cited by Linnaeus, has been preserved as an almost complete skeleton at the Natural History Museum of the University of Florence. Having confirmed its identity as an Asian elephant through both morphological and ancient DNA analyses, we designate this specimen as the lectotype of E. maximus

University of Lincoln Institutional Repository

Crossref

Copenhagen University Research Information System

espace@Curtin

Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput “omics” Data

Author: Chen Chuming
Huang Hongzhan
McGarvey Peter B.
Wu Cathy H.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge

Crossref

Directory of Open Access Journals

PubMed Central

Scop3P : a comprehensive resource of human phosphosites within their full context

Author: Hulstaert Niels
Martens Lennart
Ramasamy Pathmanaban
Tichshenko Natalia
Turan Demet
Vandermarliere Elien
Vranken Wim
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2020
Field of study

Protein phosphorylation is a key post-translational modification in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation, and functional analysis of phosphosites are therefore crucial to understand their various roles. Phosphosites are mainly analyzed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to the protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with per-residue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure–function relationships

Ghent University Academic Bibliography

UniCarbKB: building a knowledge platform for glycoproteomics

Author: Akune Yukie
Aoki-Kinoshita Kiyoko F.
Campbell Matthew P.
Gasteiger Elisabeth
Lisacek Frederique
Mariethoz Julien
Packer Nicolle H.
Peterson Robyn
Publication venue
Publication date: 02/08/2017
Field of study

The UniCarb KnowledgeBase (UniCarbKB; http://unicarbkb.org) offers public access to a growing, curated database of information on the glycan structures of glycoproteins. UniCarbKB is an international effort that aims to further our understanding of structures, pathways and networks involved in glycosylation and glyco-mediated processes by integrating structural, experimental and functional glycoscience information. This initiative builds upon the success of the glycan structure database GlycoSuiteDB, together with the informatic standards introduced by EUROCarbDB, to provide a high-quality and updated resource to support glycomics and glycoproteomics research. UniCarbKB provides comprehensive information concerning glycan structures, and published glycoprotein information including global and site-specific attachment information. For the first release over 890 references, 3740 glycan structure entries and 400 glycoproteins have been curated. Further, 598 protein glycosylation sites have been annotated with experimentally confirmed glycan structures from the literature. Among these are 35 glycoproteins, 502 structures and 60 publications previously not included in GlycoSuiteDB. This article provides an update on the transformation of GlycoSuiteDB (featured in previous NAR Database issues and hosted by ExPASy since 2009) to UniCarbKB and its integration with UniProtKB and GlycoMod. Here, we introduce a refactored database, supported by substantial new curated data collections and intuitive user-interfaces that improve database searchin

RERO DOC Digital Library

The neXtProt knowledgebase on human proteins: current status

Author: Bairoch Amos
Cusin Isabelle
Duek Paula D.
Evalet Olivier
Gateau Alain
Gaudet Pascale
Gleizes Anne
Lane Lydie
Michel Pierre-André
Pereira Mario
Teixeira Daniel
Zahn-Zabal Monique
Zhang Ying
Publication venue
Publication date: 02/08/2017
Field of study

neXtProt (http://www.nextprot.org) is a human protein-centric knowledgebase developed at the SIB Swiss Institute of Bioinformatics. Focused solely on human proteins, neXtProt aims to provide a state of the art resource for the representation of human biology by capturing a wide range of data, precise annotations, fully traceable data provenance and a web interface which enables researchers to find and view information in a comprehensive manner. Since the introductory neXtProt publication, significant advances have been made on three main aspects: the representation of proteomics data, an extended representation of human variants and the development of an advanced search capability built around semantic technologies. These changes are presented in the current neXtProt updat

RERO DOC Digital Library

Recommended from our members

Towards comprehensive annotation of Drosophila melanogaster enzymes in FlyBase.

Author: Garapati Phani V
Marygold Steven J
Rey Alix J
Zhang Jingyao
Publication venue: Database (Oxford)
Publication date: 01/01/2019
Field of study

The catalytic activities of enzymes can be described using Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. These annotations are available from numerous biological databases and are routinely accessed by researchers and bioinformaticians to direct their work. However, enzyme data may not be congruent between different resources, while the origin, quality and genomic coverage of these data within any one resource are often unclear. GO/EC annotations are assigned either manually by expert curators or inferred computationally, and there is potential for errors in both types of annotation. If such errors remain unchecked, false positive annotations may be propagated across multiple resources, significantly degrading the quality and usefulness of these data. Similarly, the absence of annotations (false negatives) from any one resource can lead to incorrect inferences or conclusions. We are systematically reviewing and enhancing the functional annotation of the enzymes of Drosophila melanogaster, focusing on improvements within the FlyBase (www.flybase.org) database. We have reviewed four major enzyme groups to date: oxidoreductases, lyases, isomerases and ligases. Herein, we describe our review workflow, the improvement in the quality and coverage of enzyme annotations within FlyBase and the wider impact of our work on other related databases

Apollo (Cambridge)

cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks

Author: Dan Lu
Likun Wang
Luhe Yang
Michael McNutt
Yan Jin
Yuxin Yin
Zuohan Peng
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Background: With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. Results: With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. Conclusions: This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services.SCI(E)CPCI-S(ISTP)[email protected]

Springer - Publisher Connector