1,189 research outputs found

    On Verifying Information Extractors

    Get PDF
    Ministerio de Economía y Competitividad TIN2013-40848-

    A neural network for semantic labelling of structured information

    Get PDF
    Intelligent systems rely on rich sources of information to make informed decisions. Using information from external sources requires establishing correspondences between the information and known information classes. This can be achieved with semantic labelling, which assigns known labels to structured information by classifying it according to computed features. The existing proposals have explored different sets of features, without focusing on what classification techniques are used. In this paper we present three contributions: first, insights on architectural issues that arise when using neural networks for semantic labelling; second, a novel implementation of semantic labelling that uses a state-of-the-art neural network classifier which achieves significantly better results than other four traditional classifiers; third, a comparison of the results obtained by the former network when using different subsets of features, comparing textual features to structural ones, and domain-dependent features to domain-independent ones. The experiments were carried away with datasets from three real world sources. Our results show that there is a need to develop more semantic labelling proposals with sophisticated classification techniques and large features catalogues.Ministerio de Economía y Competitividad TIN2016-75394-

    TAPON: a two-phase machine learning approach for semantic labelling

    Get PDF
    Through semantic labelling we enrich structured information from sources such as HTML pages, tables, or JSON files, with labels to integrate it into a local ontology. This process involves measuring some features of the information and then nding the classes that best describe it. The problem with current techniques is that they do not model relationships between classes. Their features fall short when some classes have very similar structures or textual formats. In order to deal with this problem, we have devised TAPON: a new semantic labelling technique that computes novel features that take into account the relationships. TAPON computes these features by means of a two-phase approach. In the first phase, we compute simple features and obtain a preliminary set of labels (hints). In the second phase, we inject our novel features and obtain a refined set of labels. Our experimental results show that our technique, thanks to our rich feature catalogue and novel modelling, achieves higher accuracy than other state-of-the-art techniques.Ministerio de Economía y Competitividad TIN2016-75394-

    Neural Networks for Aircraft Trajectory Prediction: Answering Open Questions About Their Performance

    Get PDF
    The increase in air traffic in the recent years has motivated the development of technologies to monitor air space and warn about possible collisions by predicting the trajectories that will be followed by aircraft. In this field, neural networks have become prominent thanks to their potential to learn to predict maneuvers without providing aspects that are difficult to model such as atmospheric conditions, or detailed aircraft parameters. A variety of models have been proposed; however, these are often tested in very limited setups, leaving many unanswered questions about how they perform in certain conditions, or whether or not their accuracy can be improved by training models for specific trajectories, using additional features, predicting more distant points directly, etc. This may be problematic for researchers or developers of these systems, who have no way of knowing what strategies will yield the best results. We have identified ten open research questions that have not been answered through in-depth testing. This motivated us to carry out a novel experimental study that performs aircraft trajectory prediction with several dozens configuration variants to answer the aforementioned questions by means of a much more complete evaluation. Some of the conclusions of our study stand in contrast with some popular practices in the state of the art, which casts some doubts on the simplicity of their application; for example, differential features are crucial for proper performance but are not mentioned by most studies, while complex, more elaborate models may lead to worse results than simple ones. Other important insights include the benefit from specialized models in more challenging scenarios, the influence of the known trajectory length in said scenarios, the step degradation of predictions when predicting further into the future, or the detrimental effect of adding additional features. These insights should help guide future research about the application of neural networks when it comes to aircraft trajectory prediction and their eventual inclusion in final systems.journal articl

    LEAPME: Learning-based Property Matching with Embeddings

    Full text link
    Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties (attributes). However, previous schema matching approaches mostly focus on two sources only and often rely on simple similarity measurements. They thus face problems in challenging use cases such as the integration of heterogeneous product entities from many sources. We therefore present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values. The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. The use of supervised machine learning helps exploit the predictive power of word embeddings. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME. We also show that our approach is even effective when training data from another domain (transfer learning) is used

    AYNEC: All you need for evaluating completion techniques in knowledge graphs

    Get PDF
    The popularity of knowledge graphs has led to the development of techniques to refine them and increase their quality. One of the main refinement tasks is completion (also known as link prediction for knowledge graphs), which seeks to add missing triples to the graph, usually by classifying potential ones as true or false. While there is a wide variety of graph completion techniques, there is no standard evaluation setup, so each proposal is evaluated using different datasets and metrics. In this paper we present AYNEC, a suite for the evaluation of knowledge graph completion techniques that covers the entire evaluation workflow. It includes a customisable tool for the generation of datasets with multiple variation points related to the preprocessing of graphs, the splitting into training and testing examples, and the generation of negative examples. AYNEC also provides a visual summary of the graph and the optional exportation of the datasets in an open format for their visualisation. We use AYNEC to generate a library of datasets ready to use for evaluation purposes based on several popular knowledge graphs. Finally, it includes a tool that computes relevant metrics and uses significance tests to compare each pair of techniques. These open source tools, along with the datasets, are freely available to the research community and will be maintained.Ministerio de Economía y Competitividad TIN2016-75394-

    Multi-source dataset of e-commerce products with attributes for property matching

    Get PDF
    Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches.Ministerio de Ciencia, Innovación y Universidades PID2019–105471RB-I00Junta de Andalucía P18-RT-1060Junta de Andalucía US-138056

    LEAPME: learning-based property matching with embeddings

    Get PDF
    Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties (attributes). However, previous schema matching approaches mostly focus on two sources only and often rely on simple similarity measurements. They thus face problems in challenging use cases such as the integration of heterogeneous product entities from many sources. We therefore present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values. The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. The use of supervised machine learning helps exploit the predictive power of word embeddings. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME. We also show that our approach is even effective when training data from another domain (transfer learning) is used.Ministerio de Economía y Competitividad TIN2016-75394-RMinisterio de Ciencia e Innovación PID2019-105471RB-I00Junta de Andalucía P18-RT-106

    TAPON-MT: a versatile framework for semantic labelling

    Get PDF
    Semantic labelling refers to the problem of assigning known labels to the elements of structured information from a source such as an HTML table or an RDF dump with unknown semantics. In the recent years it has become progressively more relevant due to the growth of available structured information in the Web of data that need to be labelled in order to integrate it in data systems. The existing approaches for semantic labelling have several drawbacks that make them unappealing if not impossible to use in certain scenarios: not accepting nested structures as input, being unable to label structural elements, not being customisable, requiring groups of instances when labelling, requiring matching instances to named entities in a knowledge base, not detecting numeric data, or not supporting complex features. In this article, we propose TAPON-MT, a framework for machine learning semantic labelling. Our framework does not have the former limitations, which makes it domain-independent and customisable. We have implemented it with a graphical interface that eases the creation and analysis of models, and we offer a web service API for their application. We have also validated it with a subset of the National Science Foundation awards dataset, and our conclusion is that TAPON-MT creates models to label information that are effective and efficient in practice.Ministerio de Economía y Competitividad TIN2016-75394-

    Introducing asymmetric functionality into MOFs via the generation of metallic Janus MOF particles

    Get PDF
    Herein we report a versatile methodology for engineering metallic Janus MOF particles based on desymmetrization at interfaces, whereby each MOF particle is partially coated with a desired metal. We demonstrate that it enables the fabrication of homogeneous Janus MOF particles according to the MOF (ZIF-8, UiO-66 or UiO-66-SH), the metal (Au, Co or Pt), the MOF particle size (from the micrometer to the submicrometer regime) and the metal-film thickness (from 5 nm to 50 nm) employed. We anticipate that our strategy could be applied to impart new functionalities to MOFs, including asymmetric functionalization, magnetic-guidance and motorization
    corecore