Search CORE

4,315 research outputs found

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

A Linked Data Approach to Sharing Workflows and Workflow Results

Author: Bechhofer S
Margaria T
Marshall MS
Missier P
Newman DR
Roos M
Roure DD
Steffen B
Zhao J
Publication venue
Publication date: 01/01/2010
Field of study

A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users

Southampton (e-Prints Soton)

Crossref

University of Birmingham Research Portal

Oxford University Research Archive

The University of Manchester - Institutional Repository

Developing the Quantitative Histopathology Image Ontology : A case study using the hot spot detection problem

Author: Doyle Scott
Gurcan Metin
James A.
John Overton
N. Tomaszewski
Ruttenberg Alan
Smith Barry
Publication venue
Publication date: 01/01/2017
Field of study

Interoperability across data sets is a key challenge for quantitative histopathological imaging. There is a need for an ontology that can support effective merging of pathological image data with associated clinical and demographic data. To foster organized, cross-disciplinary, information-driven collaborations in the pathological imaging field, we propose to develop an ontology to represent imaging data and methods used in pathological imaging and analysis, and call it Quantitative Histopathological Imaging Ontology – QHIO. We apply QHIO to breast cancer hot-spot detection with the goal of enhancing reliability of detection by promoting the sharing of data between image analysts

PhilPapers

Using Neural Networks for Relation Extraction from Biomedical Literature

Author: A Koike
A Lamurias
A Lamurias
A Lamurias
A Lamurias
A Singhal
AV Aho
B Xu
CD Manning
CH Alves
D Westergaard
D Zhou
E Guresen
F Rinaldi
HC Wang
HM Müller
J Hastings
L Aroyo
M Ashburner
MY Kim
N Ma
N Peng
P Goyal
P Zweigenbaum
PN Robinson
Q Li
QL Nguyen
S HayKin
S Hochreiter
TR Gruber
W Wang
WWM Fleuren
Y Hao
Y Luo
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2020
Field of study

Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

arXiv.org e-Print Archive

Crossref

The INCF Digital Atlasing Program: Report on Digital Atlasing Standards in the Rodent Brain

Author: Albert Burger
Fons Verbeek
G. Allan Johnson
Ilya Zaslavsky
Jonathan Nissanov
Jyl Boline
Luis Puelles
Lydia Ng
Maryann Martone
Michael Hawrylycz
Seth Ruffins
Tsutomu Hashikawa
Publication venue
Publication date: 23/11/2009
Field of study

The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.

Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals

Crossref

Nature Precedings

UIMA in the Biocuration Workflow: A coherent framework for cooperation between biologists and computational linguists

Author: Bart Mellebeek
Carlos Rodriguez-Penagos
Laura Ines Furlong
Publication venue
Publication date: 24/04/2009
Field of study

As collaborating partners, Barcelona Media Innovation Centre and GRIB (Universitat Pompeu Fabra) seek to combine strengths from Computational Linguistics and Biomedicine to produce a robust Text Mining system to generate data that will help biocurators in their daily work. The first version of this system will focus on the discovery of relationships between genes, SNPs (Single Nucleotide Polymorphisms) and diseases from the literature.

A first challenge that we were faced with during the setup of this project is the fact that most current tools that support the curation workflow are complex, ad-hoc built applications which sometimes make difficult the interoperability and results sharing between research groups from different and unrelated expert fields. Often, biologists (even computer-savvy ones) are hard pressed to use and adapt sophisticated Natural Language Processing systems, and computational linguists are challenged by the intricacies of biology in applying their processing pipelines to elicit knowledge from texts. The flow of knowledge (needed to develop a usable, practical tool) to and from the parties involved in the development of such systems is not always easy or straightforward.

The modular and versatile architecture of UIMA (Unstructed Information Management Architecture) provides a framework to address these challenges. UIMA is a component architecture and software framework implementation (including a UIMA SDK) to develop applications that analyse large volumes of unstructured information, and has been increasingly adopted by a significant part of the BioNLP community that needs industrial-grade and robust applications to exploit the whole bibliome. The use of UIMA to develop Text Mining applications useful for curation purposes allows the combination of diverse expertises which is beyond the individual know-how of biologists, computer scientists or linguists in isolation. A good synergy and circulation of knowledge between these experts is fundamental to the development of a successful curation tool

Nature Precedings

Structuring research methods and data with the research object model:genomics workflows as a case study

Author: 't Hoen Peter A. C.
Bechhofer Sean
Belhajjame Khalid
Corcho Oscar
Cruickshank Don
de Roure David
Dharuri Harish
Garrido Julian
Goble Carole
Hettne Kristina M.
Klyne Graham
Mina Eleni
Roos Marco
Soiland-Reyes Stian
Thompson Mark
van Schouwen Reinout
Verdes-Montenegro Lourdes
Wolstencroft Katherine
Zhao Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e. g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. Availability: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/r

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Oxford University Research Archive

Leiden University Scholary Publications

The University of Manchester - Institutional Repository

Lancaster E-Prints

Archivo Digital UPM