Search CORE

arXiv.org e-Print Archive

The emergence and evolution of the research fronts in HIV/AIDS research

Author: Castano Victor M
Dumontier Michel
Duran Luis
Fajardo-Ortiz David
Lara Miguel
Lopez-Cervantes Malaquias
Ochoa Hector
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

In this paper, we have identified and analyzed the emergence, structure and dynamics of the paradigmatic research fronts that established the fundamentals of the biomedical knowledge on HIV/AIDS. A search of papers with the identifiers "HIV/AIDS", "Human Immunodeficiency Virus", "HIV-1" and "Acquired Immunodeficiency Syndrome" in the Web of Science (Thomson Reuters), was carried out. A citation network of those papers was constructed. Then, a sub-network of the papers with the highest number of inter-citations (with a minimal in-degree of 28) was selected to perform a combination of network clustering and text mining to identify the paradigmatic research fronts and analyze their dynamics. Thirteen research fronts were identified in this sub-network. The biggest and oldest front is related to the clinical knowledge on the disease in the patient. Nine of the fronts are related to the study of specific molecular structures and mechanisms and two of these fronts are related to the development of drugs. The rest of the fronts are related to the study of the disease at the cellular level. Interestingly, the emergence of these fronts occurred in successive "waves" over the time which suggest a transition in the paradigmatic focus. The emergence and evolution of the biomedical fronts in HIV/AIDS research is explained not just by the partition of the problem in elements and interactions leading to increasingly specialized communities, but also by changes in the technological context of this health problem and the dramatic changes in the epidemiological reality of HIV/AIDS that occurred between 1993 and 1995

FigShare

Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology

Author: Dumontier Michel
Fan-Minogue Hua
Hughey Jake J.
Mortensen Jonathan M.
Musen Mark A.
Telis Natalie
Van Auken Kimberly
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance – fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement

Caltech Authors

SPARQL-enabled identifier conversion with Identifiers.org

Author: Bolleman Jerven
Dumontier Michel
Hermjakob Henning
Juty Nick
Katayama Toshiaki
Laibe Camille
Le Novère Nicolas
Redaschi Nicole
Wimalaratne Sarala M.
Publication venue
Publication date: 31/01/2015
Field of study

Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: [email protected]

CiteSeerX

RERO DOC Digital Library

Interoperability and FAIRness through a novel combination of Web technologies

Author: Bolleman Jerven T.
Bonino da Silva Santos Luiz Olavo
Ciccarese Paolo
Clark Tim
Dumontier Michel
Gavai Anand
Gray Alasdair J. G.
Kaliyaperumal Rajaram
Kelpin Fleur D. L.
Kuzniar Arnold
Schultes Erik A.
Swertz Morris A.
Thompson Mark
van Mulligen Erik M.
Verborgh Ruben
Wilkinson Mark D.
Publication venue: 'PeerJ'
Publication date: 01/01/2017
Field of study

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

Proceedings - University of Groningen

Heriot Watt Pure

ARTS repository - University of Groningen

University of Groningen

Ghent University Academic Bibliography

Dissertations of the University of Groningen

Semantic Web integration of Cheminformatics resources with the SADI framework

Author: A McNaught
B Chen
BP Vandervalk
C Steinbeck
CA Lipinski
CA Lipinski
DDG Gessler
E Benfenati
F Belleau
J Kietz
Leonid L Chepelev
M DiBernardo
MD Wilkinson
MD Wilkinson
Michel Dumontier
P Lord
PB Neerincx
R Guha
R Stevens
T Kuhn
T Vitvar
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background The diversity and the largely independent nature of chemical research efforts over the past half century are, most likely, the major contributors to the current poor state of chemical computational resource and database interoperability. While open software for chemical format interconversion and database entry cross-linking have partially addressed database interoperability, computational resource integration is hindered by the great diversity of software interfaces, languages, access methods, and platforms, among others. This has, in turn, translated into limited reproducibility of computational experiments and the need for application-specific computational workflow construction and semi-automated enactment by human experts, especially where emerging interdisciplinary fields, such as systems chemistry, are pursued. Fortunately, the advent of the Semantic Web, and the very recent introduction of RESTful Semantic Web Services (SWS) may present an opportunity to integrate all of the existing computational and database resources in chemistry into a machine-understandable, unified system that draws on the entirety of the Semantic Web. Results We have created a prototype framework of Semantic Automated Discovery and Integration (SADI) framework SWS that exposes the QSAR descriptor functionality of the Chemistry Development Kit. Since each of these services has formal ontology-defined input and output classes, and each service consumes and produces RDF graphs, clients can automatically reason about the services and available reference information necessary to complete a given overall computational task specified through a simple SPARQL query. We demonstrate this capability by carrying out QSAR analysis backed by a simple formal ontology to determine whether a given molecule is drug-like. Further, we discuss parameter-based control over the execution of SADI SWS. Finally, we demonstrate the value of computational resource envelopment as SADI services through service reuse and ease of integration of computational functionality into formal ontologies. Conclusions The work we present here may trigger a major paradigm shift in the distribution of computational resources in chemistry. We conclude that envelopment of chemical computational resources as SADI SWS facilitates interdisciplinary research by enabling the definition of computational problems in terms of ontologies and formal logical statements instead of cumbersome and application-specific tasks and workflows.</p

Springer - Publisher Connector

Proceedings - University of Groningen

Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data

Author: Benis Nirupama
Bernabe Cesar Henrique
Cornet Ronald
Dumontier Michel
Godoy Mario Prieto
Jacobsen Annika
Kaliyaperumal Rajaram
Kool Leo J. Schultze
Lalout Nawel
Le Cornec Clemence M. A.
Moreno Pablo Alarcon
Queralt-Rosinach Nuria
Roos Marco
Swertz Morris A.
van Damme Philip
van der Velde K. Joeri
Vieira Bruna dos Santos
Wilkinson Mark D.
Zhang Shuxin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2022
Field of study

BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Digital.CSIC

Self-organizing ontology of biochemically relevant small molecules

Author: C Steinbeck
Christoph Steinbeck
D Cotter
D Horvath
D Weininger
F Belleau
HJ Feldman
J Kaiser
J Overington
J Verma
Janna Hastings
JE Gordon
K Degtyarenko
L Rokach
Leonid L Chepelev
LL Chepelev
LL Chepelev
LL Chepelev
LL Chepelev
M Cronin
Marcus Ennis
MD Prasanna
MD Wilkinson
Michel Dumontier
Q Zhu
R Benigni
R Guha
T Puzyn
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest. Results To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development. Conclusions We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development.</p

Springer - Publisher Connector

Ten simple rules for making training materials FAIR

Author: Arcila Ricardo
Attwood Teresa K.
Batut Bérénice
Beard Niall
Burke Melissa L.
Carvalho-Silva Denise
del Angel Victoria Dominguez
Dimopoulos Alexandros C.
Dumontier Michel
Garcia Leyla
Gurwitz Kim T.
Kahlem Pascal
Krause Roland
Kuzak Mateusz
Le Pera Loredana
McQuilton Peter
Morgan Sarah L.
Palagi Patricia M.
Psomopoulos Fotis
Rauste Päivi
Rustici Gabriella
van Gelder Celia W. G.
Via Allegra
Publication venue: PLOS Computational Biology
Publication date: 01/01/2020
Field of study

Author summary: Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it’s sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They’re often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all

Serveur académique lausannois

Oxford University Research Archive

Apollo (Cambridge)

Open Repository and Bibliography - Luxembourg

Archivio della ricerca- Università di Roma La Sapienza

Supplemental Information 2: Example dataset description

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets

Heriot Watt Pure