31 research outputs found

    Combining data mining and ontology engineering to enrich ontologies and linked data

    Get PDF
    In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts

    Boundary heat diffusion classifier for a semi-supervised learning in a multilayer network embedding

    Get PDF
    International audienceThe scarcity of high-quality annotations in many application scenarios has recently led to an increasing interest in devising learning techniques that combine unlabeled data with labeled data in a network. In this work, we focus on the label propagation problem in multilayer networks. Our approach is inspired by the heat diffusion model, which shows usefulness in machine learning problems such as classification and dimensionality reduction. We propose a novel boundary-based heat diffusion algorithm that guarantees a closed-form solution with an efficient implementation. We experimentally validated our method on synthetic networks and five real-world multilayer network datasets representing scientific coauthorship, spreading drug adoption among physicians, two bibliographic networks, and a movie network. The results demonstrate the benefits of the proposed algorithm, where our boundary-based heat diffusion dominates the performance of the state-of-the-art methods

    Crowdsourcing Linked Data on listening experiences through reuse and enhancement of library data

    Get PDF
    Research has approached the practice of musical reception in a multitude of ways, such as the analysis of professional critique, sales figures and psychological processes activated by the act of listening. Studies in the Humanities, on the other hand, have been hindered by the lack of structured evidence of actual experiences of listening as reported by the listeners themselves, a concern that was voiced since the early Web era. It was however assumed that such evidence existed, albeit in pure textual form, but could not be leveraged until it was digitised and aggregated. The Listening Experience Database (LED) responds to this research need by providing a centralised hub for evidence of listening in the literature. Not only does LED support search and reuse across nearly 10,000 records, but it also provides machine-readable structured data of the knowledge around the contexts of listening. To take advantage of the mass of formal knowledge that already exists on the Web concerning these contexts, the entire framework adopts Linked Data principles and technologies. This also allows LED to directly reuse open data from the British Library for the source documentation that is already published. Reused data are re-published as open data with enhancements obtained by expanding over the model of the original data, such as the partitioning of published books and collections into individual stand-alone documents. The database was populated through crowdsourcing and seamlessly incorporates data reuse from the very early data entry phases. As the sources of the evidence often contain vague, fragmentary of uncertain information, facilities were put in place to generate structured data out of such fuzziness. Alongside elaborating on these functionalities, this article provides insights into the most recent features of the latest instalment of the dataset and portal, such as the interlinking with the MusicBrainz database, the relaxation of geographical input constraints through text mining, and the plotting of key locations in an interactive geographical browser

    TaBIIC: Taxonomy Building Through Iterative and Interactive Clustering

    No full text
    International audienceBuilding taxonomies is often a significant part of building an ontology, and many attempts have been made to automate the creation of such taxonomies from relevant data. The idea in such approaches is either that relevant definitions of the intension of concepts can be extracted as patterns in the data (e.g. in formal concept analysis) or that their extension can be built from grouping data objects based on similarity (clustering). In both cases, the process leads to an automatically constructed structure, which can either be too coarse and lacking in definition, or too fined-grained and detailed, therefore requiring to be refined into the desired taxonomy. In this paper, we explore a method that takes inspiration from both approaches in an iterative and interactive process, so that refinement and definition of the concepts in the taxonomy occur at the time of identifying those concepts in the data. We show that this method is applicable on a variety of data sources and leads to taxonomies that can be more directly integrated into ontologies

    Finding Concept Representations in Neural Networks with Self-Organizing Maps

    No full text
    International audienceIn sufficiently complex tasks, it is expected that as a side effect of learning to solve a problem, a neural network will learn relevant abstractions of the representation of that problem. This has been confirmed in particular in machine vision where a number of works showed that correlations could be found between the activations of specific units (neurons) in a neural network and the visual concepts (textures, colors, objects) present in the image. Here, we explore the use of self-organizing maps as a way to both visually and computationally inspect how activation vectors of whole layers of neural networks correspond to neural representations of abstract concepts such as 'female person' or 'realist painter'. We experiment with multiple measures applied to those maps to assess the level of representation of a concept in a network's layer. We show that, among the measures tested, the relative entropy of the activation map for a concept compared to the map for the whole data is a suitable candidate and can be used as part of a methodology to identify and locate the neural representation of a concept, visualize it, and understand its importance in solving the prediction task at hand. CCS CONCEPTS • Computing methodologies → Artificial intelligence; Neural networks; Knowledge representation and reasoning

    Unsupervised learning for understanding student achievement in a distance learning setting

    No full text
    Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the K-Prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. Knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students' needs. It can also assist students in selecting online courses that are appropriate to them.peer-reviewe

    Extracting data models from background knowledge graphs

    No full text
    International audienceKnowledge Graphs have emerged as a core technology to aggregate and publish knowledge on the Web. However, integrating knowledge from different sources, not specifically designed to be interoperable, is not a trivial task. Finding the right ontologies to model a dataset is a challenge since several valid data models exist and there is no clear agreement between them. In this paper, we propose to facilitate the selection of a data model with the RICDaM (Recommending Interoperable and Consistent Data Models) framework. RICDaM generates and ranks candidates that match entity types and properties in an input dataset. These candidates are obtained by aggregating freely available domain RDF datasets in a knowledge graph and then enriching the relationships between the graph's entities. The entity type and object property candidates are obtained by exploiting the instances and structure of this knowledge graph to compute a score that considers both the accuracy and interoperability of the candidates. Datatype properties are predicted with a random forest model, trained on the knowledge graph properties and their values, so to make predictions on candidate properties and rank them according to different measures. We present experiments using multiple datasets from the library domain as a use case and show that our methodology can produce meaningful candidate data models, adaptable to specific scenarios and needs

    Where to publish and find ontologies? A survey of ontology libraries

    No full text
    One of the key promises of the Semantic Web is its potential to enable and facilitate data interoperability. The ability of data providers and application developers to share and reuse ontologies is a critical component of this data interoperability: if different applications and data sources use the same set of well defined terms for describing their domain and data, it will be much easier for them to “talk” to one another. Ontology libraries are the systems that collect ontologies from different sources and facilitate the tasks of finding, exploring, and using these ontologies. Thus ontology libraries can serve as a link in enabling diverse users and applications to discover, evaluate, use, and publish ontologies. In this paper, we provide a survey of the growing—and surprisingly diverse—landscape of ontology libraries. We highlight how the varying scope and intended use of the libraries affects their features, content, and potential exploitation in applications. From reviewing 11 ontology libraries, we identify a core set of questions that ontology practitioners and users should consider in choosing an ontology library for finding ontologies or publishing their own. We also discuss the research challenges that emerge from this survey, for the developers of ontology libraries to address

    Online Access to Quantitative Data Resources

    Get PDF
    Nowadays, the increasing amount of semantic data available on the Web leads to a new stage in the potential of Semantic Web applications. However, it also introduces new issues due to the heterogeneity of the available semantic resources. One of the most remarkable is redundancy, that is, the excess of dierent semantic descriptions, coming from dierent sources, to describe the same intended meaning. In this paper, we propose a technique to perform a large scale integration of senses (expressed as ontology terms), in order to cluster the most similar ones, when indexing large amounts of online semantic information. It can dramatically reduce the redundancy problem on the current Semantic Web. In order to make this objective feasible, we have studied the adaptability and scalability of our previous work on sense integration, to be translated to the much larger scenario of the Semantic Web. Our evaluation shows a good behaviour of these techniques when used in large scale experiments, then making feasible the proposed approach
    corecore