Search CORE

2,821 research outputs found

Ontology alignment based on word embedding and random forest classification.

Author: Heaven Rachel
Hui Kit-Ying
Massie Stewart
Nkisi-Orji Ikechukwu
Wiratunga Nirmalie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/01/2019
Field of study

Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component for realising the goals of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. However, these techniques mostly depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding to discover semantic similarities between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts match. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can deal with knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of multiple similarity measures. Our experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems

Open Access Institutional Repository at Robert Gordon University

NERC Open Research Archive

Ontology alignment based on word embedding and random forest classification

Author: Heaven Rachel
Hui Kit-jing
Massie Stewart
Nkisi-Orji Ikechukwu
Wiratunga Nirmalie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2019
Field of study

NERC Open Research Archive

Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-based Modules

Author: Agibetov A.
Chen J.
Cross V.
Jimenez-Ruiz E.
Samwald M.
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed method is adequate in practice and can be integrated within the workflow of systems unable to cope with very large ontologies

arXiv.org e-Print Archive

City Research Online

NORA - Norwegian Open Research Archives

Recommended from our members

Embedding OWL ontologies with OWL2Vec

Author: Chen J.
Holter O. M.
Jimenez-Ruiz E.
Myklebust E. B.
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we present a preliminary study to compute embeddings for OWL 2 ontologies by projecting the ontology axioms into a graph and performing (random) walks over the ontology graph to create a corpus of sentences. This corpus is then given to a neural language model to create concept embeddings. The conducted preliminary evaluation shows promising results

City Research Online

NIVA Open Access Archive

NORA - Norwegian Open Research Archives

Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts

Author: Behr Alexander S.
Holeňa Martin
Kockmann Norbert
Korel Lukáš
Publication venue
Publication date: 17/09/2023
Field of study

This paper provides an insight into the possibility of how to find ontologies most relevant to scientific texts using artificial neural networks. The basic idea of the presented approach is to select a representative paragraph from a source text file, embed it to a vector space by a pre-trained fine-tuned transformer, and classify the embedded vector according to its relevance to a target ontology. We have considered different classifiers to categorize the output from the transformer, in particular random forest, support vector machine, multilayer perceptron, k-nearest neighbors, and Gaussian process classifiers. Their suitability has been evaluated in a use case with ontologies and scientific texts concerning catalysis research. From results we can say the worst results have random forest. The best results in this task brought support vector machine classifier

arXiv.org e-Print Archive

Site-Specific Rules Extraction in Precision Agriculture

Author: Espejo García Borja Antonio
López Pellicer Francisco Javier
Zarazaga Soria Francisco Javier
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2019
Field of study

El incremento sostenible en la producción alimentaria para satisfacer las necesidades de una población mundial en aumento es un verdadero reto cuando tenemos en cuenta el impacto constante de plagas y enfermedades en los cultivos. Debido a las importantes pérdidas económicas que se producen, el uso de tratamientos químicos es demasiado alto; causando contaminación del medio ambiente y resistencia a distintos tratamientos. En este contexto, la comunidad agrícola divisa la aplicación de tratamientos más específicos para cada lugar, así como la validación automática con la conformidad legal. Sin embargo, la especificación de estos tratamientos se encuentra en regulaciones expresadas en lenguaje natural. Por este motivo, traducir regulaciones a una representación procesable por máquinas está tomando cada vez más importancia en la agricultura de precisión.Actualmente, los requisitos para traducir las regulaciones en reglas formales están lejos de ser cumplidos; y con el rápido desarrollo de la ciencia agrícola, la verificación manual de la conformidad legal se torna inabordable.En esta tesis, el objetivo es construir y evaluar un sistema de extracción de reglas para destilar de manera efectiva la información relevante de las regulaciones y transformar las reglas de lenguaje natural a un formato estructurado que pueda ser procesado por máquinas. Para ello, hemos separado la extracción de reglas en dos pasos. El primero es construir una ontología del dominio; un modelo para describir los desórdenes que producen las enfermedades en los cultivos y sus tratamientos. El segundo paso es extraer información para poblar la ontología. Puesto que usamos técnicas de aprendizaje automático, implementamos la metodología MATTER para realizar el proceso de anotación de regulaciones. Una vez creado el corpus, construimos un clasificador de categorías de reglas que discierne entre obligaciones y prohibiciones; y un sistema para la extracción de restricciones en reglas, que reconoce información relevante para retener el isomorfismo con la regulación original. Para estos componentes, empleamos, entre otra técnicas de aprendizaje profundo, redes neuronales convolucionales y “Long Short- Term Memory”. Además, utilizamos como baselines algoritmos más tradicionales como “support-vector machines” y “random forests”.Como resultado, presentamos la ontología PCT-O, que ha sido alineada con otras ontologías como NCBI, PubChem, ChEBI y Wikipedia. El modelo puede ser utilizado para la identificación de desórdenes, el análisis de conflictos entre tratamientos y la comparación entre legislaciones de distintos países. Con respecto a los sistemas de extracción, evaluamos empíricamente el comportamiento con distintas métricas, pero la métrica F1 es utilizada para seleccionar los mejores sistemas. En el caso del clasificador de categorías de reglas, el mejor sistema obtiene un macro F1 de 92,77% y un F1 binario de 85,71%. Este sistema usa una red “bidirectional long short-term memory” con “word embeddings” como entrada. En relación al extractor de restricciones de reglas, el mejor sistema obtiene un micro F1 de 88,3%. Este extractor utiliza como entrada una combinación de “character embeddings” junto a “word embeddings” y una red neuronal “bidirectional long short-term memory”.<br /

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

An explainable data-driven approach to web directory taxonomy mapping

Author: Elena Daraio
Giuseppe Ricupero
Luca Cagliero
Paolo Garza
Silvia Chiusano
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

5noThe spread of e-commerce and web applications has fostered the integration of cross-domain business activities. To efficiently retrieve products and services, web directories allow customers to browse multiple-level taxonomies to find specific products or services according to a predefined categorization. Providers need to periodically update web directory lists by aligning in-house taxonomies to domain-specific hierarchies coming from external sources. However, such taxonomy mapping procedures are often semi-automatic and rely on traditional word disambiguation techniques to capture the semantics behind categories and products descriptions. Hence, the flexibility and explainability of the underlying models are quite limited. This paper proposes an automated, explainable approach to web directory taxonomy mapping based on text categorization. It exploits two complementary word-based text representations: a frequency-based representation, which captures syntactic text similarities, and an embedding one, which highlights the underlying semantic relationships among words. Since the proposed solution is purely data-driven, it can be successfully applied to business domains where there is a lack of semantic models. The frequency-based text representation has shown to be particularly suitable for driving the automated taxonomy mapping procedure, whereas the embedding space has been profitably used to provide local explanations of the category assignments.partially_openopenElena Daraio, Luca Cagliero, Silvia Anna Chiusano, Paolo Garza, Giuseppe RicuperoDaraio, Elena; Cagliero, Luca; Chiusano, SILVIA ANNA; Garza, Paolo; Ricupero, Giusepp

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Open Access Repository