38 research outputs found
Recommended from our members
SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge
Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review
Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research
Extracting new knowledge from web tables: Novelty or confidence?
To extend the coverage of Knowledge Bases (KBs), it is useful to integrate factual information from public tabular data. Ideally, the extracted information should not only be correct, but also novel. So far, the evaluation of state-of-the-art techniques for this task has focused primarily on the correctness of the extractions, but the novelty is less well analysed. To fill this gap, we replicated the evaluation of two state-of-the-art techniques and analyse the amount of novel extractions using two new metrics. We observe that current techniques are biased towards confidence, but this comes at the expense of novelty. We sketch a possible solution for this problem as part of our ongoing research
TAPON: a two-phase machine learning approach for semantic labelling
Through semantic labelling we enrich structured information from sources such as HTML pages, tables, or JSON files, with labels to integrate it into a local ontology. This process involves measuring some features of the information and then nding the classes that best describe it. The problem with current techniques is that they do not model relationships between classes. Their features fall short when some classes have very similar structures or textual formats. In order to deal with this problem, we have devised TAPON: a new semantic labelling technique that computes novel features that take into account the relationships. TAPON computes these features by means of a two-phase approach. In the first phase, we compute simple features and obtain a preliminary set of labels (hints). In the second phase, we inject our novel features and obtain a refined set of labels. Our experimental results show that our technique, thanks to our rich feature catalogue and novel modelling, achieves higher accuracy than other state-of-the-art techniques.Ministerio de EconomÃa y Competitividad TIN2016-75394-
TAKCO: A platform for extracting novel facts from tables
Web tables contain a large amount of useful knowledge. Takco is a new large-scale platform designed for extracting facts from tables that can be added to Knowledge Graphs (KGs) like Wikidata. Focusing on achieving high precision, current techniques are biased towards extracting redundant facts, i.e., facts already in the KG. Takco aims to find more novel facts, still at high precision. Our demonstration has two goals. The first one is to illustrate the main features of Takco's novel interpretation algorithm. The second goal is to show to what extent other state-of-the-art systems are biased towards the extraction of redundant facts using our platform, thus raising awareness on this important problem
Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction
Distant supervision (DS) has been widely used to automatically construct
(noisy) labeled data for relation extraction (RE). Given two entities, distant
supervision exploits sentences that directly mention them for predicting their
semantic relation. We refer to this strategy as 1-hop DS, which unfortunately
may not work well for long-tail entities with few supporting sentences. In this
paper, we introduce a new strategy named 2-hop DS to enhance distantly
supervised RE, based on the observation that there exist a large number of
relational tables on the Web which contain entity pairs that share common
relations. We refer to such entity pairs as anchors for each other, and collect
all sentences that mention the anchor entity pairs of a given target entity
pair to help relation prediction. We develop a new neural RE method REDS2 in
the multi-instance learning paradigm, which adopts a hierarchical model
structure to fuse information respectively from 1-hop DS and 2-hop DS.
Extensive experimental results on a benchmark dataset show that REDS2 can
consistently outperform various baselines across different settings by a
substantial margin