11 research outputs found

    A neural network for semantic labelling of structured information

    Get PDF
    Intelligent systems rely on rich sources of information to make informed decisions. Using information from external sources requires establishing correspondences between the information and known information classes. This can be achieved with semantic labelling, which assigns known labels to structured information by classifying it according to computed features. The existing proposals have explored different sets of features, without focusing on what classification techniques are used. In this paper we present three contributions: first, insights on architectural issues that arise when using neural networks for semantic labelling; second, a novel implementation of semantic labelling that uses a state-of-the-art neural network classifier which achieves significantly better results than other four traditional classifiers; third, a comparison of the results obtained by the former network when using different subsets of features, comparing textual features to structural ones, and domain-dependent features to domain-independent ones. The experiments were carried away with datasets from three real world sources. Our results show that there is a need to develop more semantic labelling proposals with sophisticated classification techniques and large features catalogues.Ministerio de Economía y Competitividad TIN2016-75394-

    AYNEC: All you need for evaluating completion techniques in knowledge graphs

    Get PDF
    The popularity of knowledge graphs has led to the development of techniques to refine them and increase their quality. One of the main refinement tasks is completion (also known as link prediction for knowledge graphs), which seeks to add missing triples to the graph, usually by classifying potential ones as true or false. While there is a wide variety of graph completion techniques, there is no standard evaluation setup, so each proposal is evaluated using different datasets and metrics. In this paper we present AYNEC, a suite for the evaluation of knowledge graph completion techniques that covers the entire evaluation workflow. It includes a customisable tool for the generation of datasets with multiple variation points related to the preprocessing of graphs, the splitting into training and testing examples, and the generation of negative examples. AYNEC also provides a visual summary of the graph and the optional exportation of the datasets in an open format for their visualisation. We use AYNEC to generate a library of datasets ready to use for evaluation purposes based on several popular knowledge graphs. Finally, it includes a tool that computes relevant metrics and uses significance tests to compare each pair of techniques. These open source tools, along with the datasets, are freely available to the research community and will be maintained.Ministerio de Economía y Competitividad TIN2016-75394-

    Atopic dermatitis and indoor use of energy sources in cooking and heating appliances

    Get PDF
    Background: Atopic dermatitis (AD) prevalence has considerably increased worldwide in recent years. Studying indoor environments is particularly relevant, especially in industrialised countries where many people spend 80% of their time at home, particularly children. This study is aimed to identify the potential association between AD and the energy source (biomass, gas and electricity) used for cooking and domestic heating in a Spanish schoolchildren population. Methods: As part of the ISAAC (International Study of Asthma and Allergies in Childhood) phase III study, a cross-sectional population-based survey was conducted with 21,355 6-to-7-year-old children from 8 Spanish ISAAC centres. AD prevalence, environmental risk factors and the use of domestic heating/cooking devices were assessed using the validated ISAAC questionnaire. Crude and adjusted odds ratios (cOR, aOR) and 95% confidence intervals (CIs) were obtained. A logistic regression analysis was performed (Chi-square test, p-value < 0.05). Results: It was found that the use of biomass systems gave the highest cORs, but only electric cookers showed a significant cOR of 1.14 (95% CI: 1.01-1.27). When the geographical area and the mother’s educational level were included in the logistic model, the obtained aOR values differed moderately from the initial cORs. Electric heating was the only type which obtained a significant aOR (1.13; 95% CI: 1.00-1.27). Finally, the model with all selected confounding variables (sex, BMI, number of siblings, mother’s educational level, smoking habits of parents, truck traffic and geographical area), showed aOR values which were very similar to those obtained in the previous adjusted logistic analysis. None of the results was statistically significant, but the use of electric heating showed an aOR close to significance (1.14; 95% CI: 0.99-1.31). Conclusion: In our study population, no statistically significant associations were found between the type of indoor energy sources used and the presence of AD

    On Data Engineering and Knowledge Graphs: a Context-Aware Proposal for Web-Scale Knowledge Graph Completion

    No full text
    Nowadays, Knowledge Graphs are a widely used means to store structured information for a variety of different domains and applications. However, due to the fact that they are usually constructed using automated information extraction techniques, they are often incomplete, either because these techniques failed to extract the relevant information, or because it was not present altogether in the original sources. The problem that we address in this dissertation is how to find this missing knowledge and complete Knowledge Graphs in an automatic manner. In the literature, there are already many proposals to perform this task. However, they have important drawbacks, namely: they rely on embedded representations, which are computationally expensive to generate and demand frequent regenerations, they require human intervention or human-provided data, they rely on external sources of information, they cannot produce new knowledge on their own, or they do not scale properly to very large Knowledge Graphs. In this dissertation, we present a new automated proposal for completing Knowledge Graphs that does not suffer from any of the previous drawbacks. Our contribution is threefold: CHAI, a technique for automatically generating tractable sets of candidate triples; CAFE, a high-accuracy triple classification proposal; and SciCheck, a technique specifically tailored for completing scientific Knowledge Graphs. Our theoretical and practical validation suggests that our proposal is very efficient and effective in practice, and that it is able to successfully complete Knowledge Graphs of varying natures.Hoy en día, los grafos de conocimiento son una herramienta ampliamente usada para almacenar y representar información estructurada para una gran variedad de dominios y aplicaciones prácticas. Sin embargo, debido a que generalmente son construidos usando técnicas de extracción automática de información, éstos suelen estar incompletos. Esto se debe a que las citadas técnicas pueden no extraer satisfactoriamente la información deseada, o a que la fuente original no contenía suficiente información. El problema tratado en esta tesis doctoral es cómo encontrar este conocimiento que falta y completar un grafo de conocimiento de manera automática. En la bibliografía existen numerosas propuestas para lograr este objetivo, pero tienen importantes inconvenientes, concretamente: necesitan utilizar embeddings, que son computacionalmente costosos de obtener y requieren ser regenerados frecuentemente, necesitan intervención humana o datos generados manualmente, tienen una dependencia fuerte con fuentes externas de información, no tienen ningún modo para generar nuevo conocimiento por ellas mismas, o no son aplicables a grafos de conocimiento muy grandes. En esta tesis presentamos una nueva propuesta automatizada para completar grafos de conocimiento que no sufre de los problemas anteriores. Nuestra contribución tiene tres elementos principales: CHAI, una técnica para generar automáticamente conjuntos manejables de tripletas candidatas; CAFE, una propuesta de clasificación de tripletas de alta precisión; y SciCheck, una técnica especialmente diseñada para completar grafos de conocimiento científicos. Nuestra validación, tanto teórica como basada en una aplicación práctica, sugiere que nuestra propuesta es muy eficiente y efectiva en casos de uso reales, y que es capaz de completar satisfactoriamente grafos de conocimiento de todo tipo

    CAFE: Fact Checking in Knowledge Graphs using Neighborhood-Aware Features

    Get PDF
    Knowledge Graphs (KGs) currently contain a vast amount of structured information in the form of entities and relations. Because KGs are often constructed automatically by means of information extraction processes, they may miss information that was either not present in the original source or not successfully extracted. As a result, KGs might potentially lack useful and valuable information. Current approaches that aim to complete missing information in KGs either have a dependence on embedded representations, which hinders their scalability and applicability to different KGs; or are based on long random paths that may not cover relevant information by mere chance, since exhaustively analyzing all possible paths of a large length between entities is very time-consuming. In this paper, we present an approach to completing KGs based on evaluating candidate triples using a novel set of features, which exploits the highly relational nature of KGs by analyzing the entities and relations surrounding any given pair of entities. Our results show that our proposal is able to identify correct triples with a higher effectiveness than other state-of-the-art approaches (up to 60% higher precision or 20% higher recall in some datasets).Ministerio de Economía y Competitividad TIN2016-75394-

    CAFE: Knowledge graph completion using neighborhood-aware features

    No full text
    Knowledge Graphs (KGs) currently contain a vast amount of structured information in the form of entities and relations. Because KGs are often constructed automatically by means of information extraction processes, they may miss information that was either not present in the original source or not successfully extracted. As a result, KGs might lack useful and valuable information. Current approaches that aim to complete missing information in KGs have two main drawbacks. First, some have a dependence on embedded representations, which impose a very expensive preprocessing step and need to be recomputed again as the KG grows. Second, others are based on long random paths that may not cover all relevant information, whereas exhaustively analyzing all possible paths between entities is very time-consuming. In this paper, we present an approach to complete KGs based on evaluating candidate triples using a set of neighborhood-based features. Our approach exploits the highly connected nature of KGs by analyzing the entities and relations surrounding any given pair of entities, while avoiding full recomputations as new entities are added. Our results indicate that our proposal is able to identify correct triples with a higher effectiveness than other state-of-the-art approaches, achieving higher average F1 scores in all tested datasets. Therefore, we conclude that the information present in the vicinities of the two entities within a candidate triple can be leveraged to determine whether that triple is missing from the KG or not.Ministerio de Economía y Competitividad TIN2016-75394-RMinisterio de Ciencia, Innovación y Universidades PID2019-105471RB-I0

    AYNEC-DataGen: a tool for generating evaluation datasets for Knowledge Graphs completion

    No full text
    In the context of knowledge graphs, the task of completion of relations consists in adding missing triples to a knowledge graph, usually by classifying potential candidates as true of false. Creating an evalu- ation dataset for these techniques is not trivial, since there is a large amount of variables to consider which, if not taken into account, may cause misleading results. So far, there is not a well de ned work ow that identi es the variation points when creating a dataset, and what are the possible strategies that can be followed in each step. Furthermore, there are no tools that help create such datasets in an easy way. To address this need, we have created AYNEC-DataGen, a customisable tool for the generation of datasets with multiple variation points related to the pre- processing of the original knowledge graph, the splitting of triples into training and testing sets, and the generation of negative examples. The output of our tool includes the evaluation dataset, an optional export in an open format for its visualisation, and additional files with metadata. Our tool is freely available online.Ministerio de Economía y Competitividad TIN2016-75394-

    Silence: un framework de apoyo a la docencia de desarrollo web

    Get PDF
    La programación web es, actualmente, una de las vertientes más importantes de la informática en la industria nacional e internacional. Un aspecto muy relacionado, aunque en muchas ocasiones impartido de manera separada, es la enseñanza sobre bases de datos y el acceso a los datos contenidos en las mismas. Con objeto de unir más estrechamente la enseñanza de estas dos ramas, en el presente trabajo presentamos Silence, un framework que provee un entorno unificado para el desarrollo de bases de datos relacionales que den soporte a una aplicación, y de aplicaciones web que consuman los servicios provistos por la base de datos a través de una API RESTful moderna. Silence permite simplificar en gran medida el despliegue de endpoints RESTful sobre una base de datos relacional, posibilitando un aprendizaje del desarrollo web más ágil y una diferenciación clara entre back-end y front-end. Además, da soporte al aprendizaje basado en proyectos, siendo éstos autocontenidos y facilitando su despliegue, desarrollo y evaluación. El framework se encuentra actualmente en uso por más de 500 alumnos en tres titulaciones de grado, y está disponible libremente como código abierto.Web development is currently one of the most prominent fields in the computing industry, both local and international. Database management and usage stands out as a highly related field, though it is often taught independently. In this work we introduce Silence, a framework whose goal is to allow for a joint teaching of databases and web development. Silence provides a unified environment for developing relational databases to store an application’s data, and web applications that use such databases through a modern RESTful API. Silence greatly simplifies the process of deploying RESTful endpoints to interact with a database, resulting in a more agile web development learning process and in a very clear distinction between back-end and front-end. Furthermore, Silence is oriented towards project-based learning, offering self-contained projects that are easy to develop, deploy and evaluate. Our framework is currently actively used by more than 500 students in three different degrees, and it is freely available as open source software
    corecore