Search CORE

31 research outputs found

Combining data mining and ontology engineering to enrich ontologies and linked data

Author: D’Aquin Mathieu
Kronberger Gabriel
Suárez-Figueroa Mari Carmen
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts

Archivo Digital UPM

Boundary heat diffusion classifier for a semi-supervised learning in a multilayer network embedding

Author: D’aquin Mathieu
Nováček Vít
Timilsina Mohan
Yang Haixuan
Publication venue: 'Elsevier BV'
Publication date: 01/12/2022
Field of study

International audienceThe scarcity of high-quality annotations in many application scenarios has recently led to an increasing interest in devising learning techniques that combine unlabeled data with labeled data in a network. In this work, we focus on the label propagation problem in multilayer networks. Our approach is inspired by the heat diffusion model, which shows usefulness in machine learning problems such as classification and dimensionality reduction. We propose a novel boundary-based heat diffusion algorithm that guarantees a closed-form solution with an efficient implementation. We experimentally validated our method on synthetic networks and five real-world multilayer network datasets representing scientific coauthorship, spreading drug adoption among physicians, two bibliographic networks, and a movie network. The results demonstrate the benefits of the proposed algorithm, where our boundary-based heat diffusion dominates the performance of the state-of-the-art methods

INRIA a CCSD electronic archive server

Crowdsourcing Linked Data on listening experiences through reuse and enhancement of library data

Author: A Basharat
Alessandro Adamou
C Bizer
C Page
Carlo Allocca
H Halpin
Helen Barlow
M Bradley
Mathieu d’Aquin
P Golden
PN Juslin
RC Wegman
S Burstyn
Simon Brown
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/02/2018
Field of study

Research has approached the practice of musical reception in a multitude of ways, such as the analysis of professional critique, sales figures and psychological processes activated by the act of listening. Studies in the Humanities, on the other hand, have been hindered by the lack of structured evidence of actual experiences of listening as reported by the listeners themselves, a concern that was voiced since the early Web era. It was however assumed that such evidence existed, albeit in pure textual form, but could not be leveraged until it was digitised and aggregated. The Listening Experience Database (LED) responds to this research need by providing a centralised hub for evidence of listening in the literature. Not only does LED support search and reuse across nearly 10,000 records, but it also provides machine-readable structured data of the knowledge around the contexts of listening. To take advantage of the mass of formal knowledge that already exists on the Web concerning these contexts, the entire framework adopts Linked Data principles and technologies. This also allows LED to directly reuse open data from the British Library for the source documentation that is already published. Reused data are re-published as open data with enhancements obtained by expanding over the model of the original data, such as the partitioning of published books and collections into individual stand-alone documents. The database was populated through crowdsourcing and seamlessly incorporates data reuse from the very early data entry phases. As the sources of the evidence often contain vague, fragmentary of uncertain information, facilities were put in place to generate structured data out of such fuzziness. Alongside elaborating on these functionalities, this article provides insights into the most recent features of the latest instalment of the dataset and portal, such as the interlinking with the MusicBrainz database, the relaxation of geographical input constraints through text mining, and the plotting of key locations in an interactive geographical browser

Crossref

Open Research Online (The Open University)

TaBIIC: Taxonomy Building Through Iterative and Interactive Clustering

Author: D’aquin Mathieu
Publication venue: IOS Press
Publication date: 18/09/2023
Field of study

International audienceBuilding taxonomies is often a significant part of building an ontology, and many attempts have been made to automate the creation of such taxonomies from relevant data. The idea in such approaches is either that relevant definitions of the intension of concepts can be extracted as patterns in the data (e.g. in formal concept analysis) or that their extension can be built from grouping data objects based on similarity (clustering). In both cases, the process leads to an automatically constructed structure, which can either be too coarse and lacking in definition, or too fined-grained and detailed, therefore requiring to be refined into the desired taxonomy. In this paper, we explore a method that takes inspiration from both approaches in an iterative and interactive process, so that refinement and definition of the concepts in the taxonomy occur at the time of identifying those concepts in the data. We show that this method is applicable on a variety of data sources and leads to taxonomies that can be more directly integrated into ontologies

INRIA a CCSD electronic archive server

Finding Concept Representations in Neural Networks with Self-Organizing Maps

Author: D’aquin Mathieu
Publication venue: ACM
Publication date: 05/12/2023
Field of study

International audienceIn sufficiently complex tasks, it is expected that as a side effect of learning to solve a problem, a neural network will learn relevant abstractions of the representation of that problem. This has been confirmed in particular in machine vision where a number of works showed that correlations could be found between the activations of specific units (neurons) in a neural network and the visual concepts (textures, colors, objects) present in the image. Here, we explore the use of self-organizing maps as a way to both visually and computationally inspect how activation vectors of whole layers of neural networks correspond to neural representations of abstract concepts such as 'female person' or 'realist painter'. We experiment with multiple measures applied to those maps to assess the level of representation of a concept in a network's layer. We show that, among the measures tested, the relative entropy of the activation map for a concept compared to the map for the whole data is a suitable candidate and can be used as part of a methodology to identify and locate the neural representation of a concept, visualize it, and understand its importance in solving the prediction task at hand. CCS CONCEPTS • Computing methodologies → Artificial intelligence; Neural networks; Knowledge representation and reasoning

INRIA a CCSD electronic archive server

Unsupervised learning for understanding student achievement in a distance learning setting

Author: d’Aquin Mathieu
Liu Shuangyan
Publication venue: IEEE
Publication date: 05/10/2017
Field of study

Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the K-Prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. Knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students' needs. It can also assist students in selecting online courses that are appropriate to them.peer-reviewe

Access to Research at National University of Ireland, Galway

Extracting data models from background knowledge graphs

Author: D’aquin Mathieu
Oliveira Daniela
Publication venue: 'Elsevier BV'
Publication date: 01/02/2022
Field of study

International audienceKnowledge Graphs have emerged as a core technology to aggregate and publish knowledge on the Web. However, integrating knowledge from different sources, not specifically designed to be interoperable, is not a trivial task. Finding the right ontologies to model a dataset is a challenge since several valid data models exist and there is no clear agreement between them. In this paper, we propose to facilitate the selection of a data model with the RICDaM (Recommending Interoperable and Consistent Data Models) framework. RICDaM generates and ranks candidates that match entity types and properties in an input dataset. These candidates are obtained by aggregating freely available domain RDF datasets in a knowledge graph and then enriching the relationships between the graph's entities. The entity type and object property candidates are obtained by exploiting the instances and structure of this knowledge graph to compute a score that considers both the accuracy and interoperability of the candidates. Datatype properties are predicted with a random forest model, trained on the knowledge graph properties and their values, so to make predictions on candidate properties and rank them according to different measures. We present experiments using multiple datasets from the library domain as a use case and show that our methodology can produce meaningful candidate data models, adaptable to specific scenarios and needs

INRIA a CCSD electronic archive server

Where to publish and find ontologies? A survey of ontology libraries

Author: Ahmad
Alexander
Allocca
Baclawski
Bizer
Buitelaar
Correndo
Côté
Ding
Ding
d’Aquin
d’Aquin
d’Aquin
d’Aquin
d’Aquin
Euzenat
Farquhar
Fielding
Golbreich
Hartmann
Hepp
Hyvönen
Lewen
Mathieu d’Aquin
Natalya F. Noy
Noy
Noy
Noy
Noy
Nyulas
Palma
Pathak
Rubin
Salton
Shaw
Smith
Thomas
Tudorache
Publication venue: 'Elsevier BV'
Publication date: 01/03/2012
Field of study

One of the key promises of the Semantic Web is its potential to enable and facilitate data interoperability. The ability of data providers and application developers to share and reuse ontologies is a critical component of this data interoperability: if different applications and data sources use the same set of well defined terms for describing their domain and data, it will be much easier for them to “talk” to one another. Ontology libraries are the systems that collect ontologies from different sources and facilitate the tasks of finding, exploring, and using these ontologies. Thus ontology libraries can serve as a link in enabling diverse users and applications to discover, evaluate, use, and publish ontologies. In this paper, we provide a survey of the growing—and surprisingly diverse—landscape of ontology libraries. We highlight how the varying scope and intended use of the libraries affects their features, content, and potential exploitation in applications. From reviewing 11 ontology libraries, we identify a core set of questions that ontology practitioners and users should consider in choosing an ontology library for finding ontologies or publishing their own. We also discuss the research challenges that emerge from this survey, for the developers of ontology libraries to address

Crossref

Open Research Online (The Open University)

PubMed Central

Online Access to Quantitative Data Resources

Author: D’Aquin Mathieu
Gracia Jorge
Mena E.
Publication venue: Sage Publications Ltd
Publication date: 01/01/2009
Field of study

Nowadays, the increasing amount of semantic data available on the Web leads to a new stage in the potential of Semantic Web applications. However, it also introduces new issues due to the heterogeneity of the available semantic resources. One of the most remarkable is redundancy, that is, the excess of dierent semantic descriptions, coming from dierent sources, to describe the same intended meaning. In this paper, we propose a technique to perform a large scale integration of senses (expressed as ontology terms), in order to cluster the most similar ones, when indexing large amounts of online semantic information. It can dramatically reduce the redundancy problem on the current Semantic Web. In order to make this objective feasible, we have studied the adaptability and scalability of our previous work on sense integration, to be translated to the much larger scenario of the Semantic Web. Our evaluation shows a good behaviour of these techniques when used in large scale experiments, then making feasible the proposed approach

Crossref

The University of Manchester - Institutional Repository

Archivo Digital UPM