4,451 research outputs found
ColNet: Embedding the Semantics of Web Tables for Column Type Prediction
Automatically annotating column types with knowledge base(KB) concepts is a critical task to gain a basic understandingof web tables. Current methods rely on either table metadatalike column name or entity correspondences of cells in theKB, and may fail to deal with growing web tables with in-complete meta information. In this paper we propose a neu-ral network based column type annotation framework namedColNetwhich is able to integrate KB reasoning and lookupwith machine learning and can automatically train Convolu-tional Neural Networks for prediction. The prediction modelnot only considers the contextual semantics within a cell us-ing word representation, but also embeds the semantics of acolumn by learning locality features from multiple cells. Themethod is evaluated with DBPedia and two different web ta-ble datasets, T2Dv2 from the general Web and Limaye fromWikipedia pages, and achieves higher performance than thestate-of-the-art approaches
Recommended from our members
Learning Semantic Annotations for Tabular Data
The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm.It exhibits good performance not only on individual table sets, but also when transferring from one table set to another
Recommended from our members
CitySAT: a System for the Semantic Answer Type Prediction Task
This paper describes the CitySAT system that we developed for the DBpedia Answer Type (AT) prediction task of the SMART 2021 challenge. The challenge can be interpreted as a multi-class classification task that takes natural language questions and returns pairs of the predicted answer category and types. For training, we merged the SMART 2021 DBpedia dataset with the 2020 dataset given for the previous year's AT task. In this study, three local Machine Learning (ML) models are deployed to cover the three different types of task and question (category prediction, literal type prediction and resource type prediction). The best model obtains a 98.36% accuracy for the category prediction using a Logistic Regression (LR) classifier. Similarly, another LR model results in 97.90% accuracy for the literal type prediction task. Lastly we also built a Multi-Layer Perceptron (MLP) model to deal with several ontology classes (∼760 classes for DBpedia) in the resource type prediction task. The best MLP model achieves 79.34% on the merged training dataset. The final system output obtained a 98.4% accuracy, 84.2% NDCG@5, and 85.4% NDCG@10 on the (official) test dataset
Recommended from our members
First steps in the logic-based assessment of post-composed phenotypic descriptions
In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies
Controlled formation of bubbles in a planar co-flow configuration
We present a new method that allows to control the bubble size and formation frequency in a planar air-water co-flow configuration by modulating the Water velocity at the nozzle exit. The forcing process has been experimentally characterized determining the amplitude of the water velocity fluctuations from measurements of the pressure variations in the water stream. The effect of the forcing on the bubbling process has been described by analyzing the pressute signals in the air stream in combinatiOn with visualizations performed with a high-speed camera. We show that, when the forcing amplitude is sufficiently large, the bubbles can be generated at a rate different from the natural bubbling frequency, f(n), which depends on the water-to-air velocity ratio, Lambda u(n)/u(q), and the Weber number, We rho(w)u(n)(2)H(0)/sigma, where 110 is the half-thickness of the air stream at the exit slit, rho(w), the water density and a the surface tension coefficient. Consequently, when the forcing is effective, monodisperse bubbles, of sizes smaller than those generated without stimulation, are produced at the prescribed frequency, f(f) > f(n). The effect of the forcing process on the bubble size is also characterized by measuring the resulting intact length, 1, i.e. the length of the air stem that remains attached to the injector when a bubble is released. In addition, the physics behind the forcing procedure is explained as a purely kinematic mechanism that is added to the effect of the pressure evolution inside the air stream that would take place in the unforced case. Finally, the downstream position of the maximum perturbation amplitude has been determined by a one-dimensional model, exhibiting a good agreement with both experiments and numerical simulations performed with OpenFOAM.This work has been supported by the Spanish MINECO (Subdirección General de Gestión de Ayudas a la Investigación), Junta de AndalucÃa and European Funds, grants Nos. DPI2014-59292-C3-1-P, DPI2014-59292-C3-3-P, P11-TEP7495. Financial support from the University of Jaén, Project No. UJA2013/08/05, is also acknowledged
INCMap: A Journey towards ontology-based data integration
Ontology-based data integration (OBDI) allows users to federate over heterogeneous data sources using a semantic rich conceptual data model. An important challenge in ODBI is the curation of mappings between the data sources and the global ontology. In the last years, we have built IncMap, a system to semi-automatically create mappings between relational data sources and a global ontology. IncMap has since been put into practice, both for academic and in industrial applications. Based on the experience of the last years, we have extended the original version of IncMap in several dimensions to enhance the mapping quality: (1) IncMap can detect and leverage semantic-rich patterns in the relational data sources such as inheritance for the mapping creation. (2) IncMap is able to leverage reasoning rules in the ontology to overcome structural differences from the relational data sources. (3) IncMap now includes a fully automatic mode that is often necessary to bootstrap mappings for a new data source. Our experimental evaluation shows that the new version of IncMap outperforms its previous version as well as other state-of-the-art systems
Recommended from our members
Evaluating Ontology Matching Systems on Large, Multilingual and Real-world Test Cases
In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign are divided in five tracks. Three of these tracks are new or have been improved compared to previous OAEI campaigns. Overall, we evaluated 18 matching systems. We discuss lessons learned, in terms of scalability, multilingual issues and the ability do deal with real world cases from different domains
Querying industrial stream-temporal data: An ontology-based visual approach
An increasing number of sensors are being deployed in business-critical environments, systems, and equipment; and stream a vast amount of data. The operational efficiency and effectiveness of business processes rely on domain experts’ agility in interpreting data into actionable business information. A domain expert has extensive domain knowledge but not necessarily skills and knowledge on databases and formal query languages. Therefore, centralised approaches are often preferred. These require IT experts to translate the information needs of domain experts into extract-transform-load (ETL) processes in order to extract and integrate data and then let domain experts apply predefined analytics. Since such a workflow is too time intensive, heavy-weight and inflexible given the high volume and velocity of data, domain experts need to extract and analyse the data of interest directly. Ontologies, i.e., semantically rich conceptual domain models, present an intelligible solution by describing the domain of interest on a higher level of abstraction closer to the reality. Moreover, recent ontology-based data access (OBDA) technologies enable end users to formulate their information needs into queries using a set of terms defined in an ontology. Ontological queries could then be translated into SQL or some other database query languages, and executed over the data in its original place and format automatically. To this end, this article reports an ontology-based visual query system (VQS), namely OptiqueVQS, how it is extended for a stream-temporal query language called STARQL, a user experiment with the domain experts at Siemens AG, and STARQL’s query answering performance over a proof of concept implementation for PostgreSQL
A T-PROPER KARHUNEN-LOÈVE EXPANSION AND ITS APPLICATION TO THE PROBLEM OF SIMULATION
The paper addresses the Karhunen-Loeve series representation in the tessarine domain. Based on augmented statistics, a tessarine widely linear Karhunen-Loeve expansion is defined. Then, the impact of T-properness on this representationis analyzed, leading to a T-proper Karhunen-Loeve expansion that means a dimensionality reduction. Furthermore, this series representation serves as a versatile simulation tool, valid for both stationary and non-stationary, Gaussian and non-Gaussian random signals. Finally, the applicability of the simulation technique proposed is examined numerically
- …