Search CORE

249 research outputs found

Yeast Features: Identifying Significant Features Shared Among Yeast Proteins for Functional Genomics

Author: Ashkan Golshani
Frank Dehne
James J. Cheetham
James R. Green
Md Alamgir
Michel Dumontier
Myron L. Smith
Nadereh Mir-Rashed
Veronika Eroukova
Publication venue
Publication date: 18/09/2008
Field of study

Background
High throughput yeast functional genomics experiments are revealing associations among tens to hundreds of genes using numerous experimental conditions. To fully understand how the identified genes might be involved in the observed system, it is essential to consider the widest range of biological annotation possible. Biologists often start their search by collating the annotation provided for each protein within databases such as the Saccharomyces Genome Database, manually comparing them for similar features, and empirically assessing their significance. Such tasks can be automated, and more precise calculations of the significance can be determined using established probability measures. 
Results
We developed Yeast Features, an intuitive online tool to help establish the significance of finding a diverse set of shared features among a collection of yeast proteins. A total of 18,786 features from the Saccharomyces Genome Database are considered, including annotation based on the Gene Ontology’s molecular function, biological process and cellular compartment, as well as conserved domains, protein-protein and genetic interactions, complexes, metabolic pathways, phenotypes and publications. The significance of shared features is estimated using a hypergeometric probability, but novel options exist to improve the significance by adding background knowledge of the experimental system. For instance, increased statistical significance is achieved in gene deletion experiments because interactions with essential genes will never be observed. We further demonstrate the utility by suggesting the functional roles of the indirect targets of an aminoglycoside with a known mechanism of action, and also the targets of an herbal extract with a previously unknown mode of action. The identification of shared functional features may also be used to propose novel roles for proteins of unknown function, including a role in protein synthesis for YKL075C.
Conclusions
Yeast Features (YF) is an easy to use web-based application (http://software.dumontierlab.com/yeastfeatures/) which can identify and prioritize features that are shared among a set of yeast proteins. This approach is shown to be valuable in the analysis of complex data sets, in which the extracted associations revealed significant functional relationships among the gene products.&#xa

Nature Precedings

A linked data representation for summary statistics and grouping criteria

Author: Chari Shruthi
Dumontier Michel
Luciano Joanne S.
McCusker James P.
McGuinness Deborah L.
Publication venue
Publication date: 01/01/2019
Field of study

Maastricht University Research Portal

A linked data representation for summary statistics and grouping criteria

Author: Chari Shruthi
Dumontier Michel
Luciano Joanne S.
McCusker James P.
McGuinness Deborah L.
Publication venue
Publication date: 01/01/2019
Field of study

Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium’s provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute

Maastricht University Research Portal

Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

Author: Dumontier Michel
Ippel Lianne
Oberski Daniel L.
Sun Chang
van Kesteren Erik-Jan
Publication venue
Publication date: 01/01/2019
Field of study

Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptography and distributed learning have made great progress towards privacy-preserving and distributed data mining. However, practical implementations have been hampered by the limited scope or computational complexity of these methods. In this paper, we greatly extend the range of analyses available for vertically partitioned data, i.e., data collected by separate parties with different features on the same subjects. To this end, we present a novel approach for privacy-preserving generalized linear models, a fundamental and powerful framework underlying many prediction and classification procedures. We base our method on a distributed block coordinate descent algorithm to obtain parameter estimates, and we develop an extension to compute accurate standard errors without additional communication cost. We critically evaluate the information transfer for semi-honest collaborators and show that our protocol is secure against data reconstruction. Through both simulated and real-world examples we illustrate the functionality of our proposed algorithm. Without leaking information, our method performs as well on vertically partitioned data as existing methods on combined data -- all within mere minutes of computation time. We conclude that our method is a viable approach for vertically partitioned data analysis with a wide range of real-world applications.Comment: Fully reproducible code for all results and images can be found at https://github.com/vankesteren/privacy-preserving-glm, and the software package can be found at https://github.com/vankesteren/privre

arXiv.org e-Print Archive

Utrecht University Repository

Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud

Author: Da Silva Santos L.
Dumontier M.
Mons B.
Neylon Cameron
Velterop J.
Wilkinson M.
Publication venue: 'IOS Press'
Publication date: 01/01/2017
Field of study

The FAIR Data Principles propose that all scholarly output should be Findable, Accessible, Interoperable, and Reusable. As a set of guiding principles, expressing only the kinds of behaviours that researchers should expect from contemporary data resources, how the FAIR principles should manifest in reality was largely open to interpretation. As support for the Principles has spread, so has the breadth of these interpretations. In observing this creeping spread of interpretation, several of the original authors felt it was now appropriate to revisit the Principles, to clarify both what FAIRness is, and is not

Maastricht University Research Portal

Crossref

VU Research Portal

espace@Curtin

Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics

Author: Baker Christopher JO
Chepelev Leonid L
Dumontier Michel
Kouznetsov Alexandre
Low Hong Sang
Riazanov Alexandre
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. Results As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of high-throughput lipidomics. Conclusions Our prototype framework is capable of accurate automated classification of lipids and facile integration of lipid class information with additional data obtained with SADI web services. The potential of programming-free integration of external web services through the SADI framework offers an opportunity for development of powerful novel applications in lipidomics. We conclude that semantic web technologies can provide an accurate and versatile means of classification and annotation of lipids.</p

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Provenance-Centered Dataset of Drug-Drug Interactions

Author: A Callahan
A Gottlieb
B Mons
DS Wishart
J Lazarou
K Haerian
L Zhang
M Dumontier
NP Tatonetti
P Avillach
P Groth
RL Bushardt
S Vilar
SV Iyer
T Kuhn
Publication venue
Publication date: 01/01/2015
Field of study

Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

VU Research Portal

FigShare

Integrating systems biology models and biomedical ontologies

Author: Cook Daniel L.
de Bono Bernard
Dumontier Michel
Gennari John H.
Gkoutos Georgios V.
Hoehndorf Robert
Wimalaratne Sarala
Publication venue
Publication date: 01/01/2011
Field of study

BACKGROUND: Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. RESULTS: We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. CONCLUSIONS: We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms

Maastricht University Research Portal

Crossref

Aberystwyth Research Portal

University of Birmingham Research Portal

Springer - Publisher Connector

PubMed Central

Apollo (Cambridge)

Interoperability and FAIRness through a novel combination of Web technologies

Author: Bolleman Jerven T.
Bonino da Silva Santos Luiz Olavo
Ciccarese Paolo
Clark Tim
Dumontier Michel
Gavai Anand
Gray Alasdair J. G.
Kaliyaperumal Rajaram
Kelpin Fleur D. L.
Kuzniar Arnold
Schultes Erik A.
Swertz Morris A.
Thompson Mark
van Mulligen Erik M.
Verborgh Ruben
Wilkinson Mark D.
Publication venue: 'PeerJ'
Publication date: 01/01/2017
Field of study

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

Maastricht University Research Portal

Heriot Watt Pure

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Ghent University Academic Bibliography

Directory of Open Access Journals

Dissertations of the University of Groningen

Power grip, pinch grip, manual muscle testing or thenar atrophy - which should be assessed as a motor outcome after carpal tunnel decompression? A systematic review

Author: AA Gerritsen
C Dumontier
C Jerosch-Herold
C Shum
CF Leinberry
Christina Jerosch-Herold
CS Simpson
D Moher
DG Altman
DJ Mackenzie
DW Levine
GD Foulkes
GR Sennwald
I Atroshi
JC MacDermid
JC MacDermid
JJ Dias
JM Agee
Jo Geere
JW Brandsma
K Nakamichi
KA Kuhlman
KC Wong
L Gilbertson
L Vanderweeën
LA J. Kazis
LG Portney
MC de Krom
MW Erdmann
N Borisch
ND Citron
NL Saw
P Brüser
R Bhattacharya
RA Brown
Rachel Chester
RD Ferdinand
RH Helm
RW Bohannon
S Standring
S Sunderland
SE Mackinnon
Swati Kale
TA Schreuders
TA Schreuders
TA Schreuders
TE Trumble
V Mathiowetz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Abstract Background Objective assessment of motor function is frequently used to evaluate outcome after surgical treatment of carpal tunnel syndrome (CTS). However a range of outcome measures are used and there appears to be no consensus on which measure of motor function effectively captures change. The purpose of this systematic review was to identify the methods used to assess motor function in randomized controlled trials of surgical interventions for CTS. A secondary aim was to evaluate which instruments reflect clinical change and are psychometrically robust. Methods The bibliographic databases Medline, AMED and CINAHL were searched for randomized controlled trials of surgical interventions for CTS. Data on instruments used, methods of assessment and results of tests of motor function was extracted by two independent reviewers. Results Twenty-two studies were retrieved which included performance based assessments of motor function. Nineteen studies assessed power grip dynamometry, fourteen studies used both power and pinch grip dynamometry, eight used manual muscle testing and five assessed the presence or absence of thenar atrophy. Several studies used multiple tests of motor function. Two studies included both power and pinch strength and reported descriptive statistics enabling calculation of effect sizes to compare the relative responsiveness of grip and pinch strength within study samples. The study findings suggest that tip pinch is more responsive than lateral pinch or power grip up to 12 weeks following surgery for CTS. Conclusion Although used most frequently and known to be reliable, power and key pinch dynamometry are not the most valid or responsive tools for assessing motor outcome up to 12 weeks following surgery for CTS. Tip pinch dynamometry more specifically targets the thenar musculature and appears to be more responsive. Manual muscle testing, which in theory is most specific to the thenar musculature, may be more sensitive if assessed using a hand held dynamometer – the Rotterdam Intrinsic Handheld Myometer. However further research is needed to evaluate its reliability and responsiveness and establish the most efficient and psychometrically robust method of evaluating motor function following surgery for CTS.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of East Anglia digital repository