Search CORE

301 research outputs found

Discovering Graph Functional Dependencies

Author: Abiteboul S.
Akhtar W.
Calvanese D.
Cortés-Calabuig A.
Eppstein D.
Flum J.
Gallego M. A.
Korf R. E.
Mahdisoltani F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2018
Field of study

Crossref

Edinburgh Research Explorer

Towards High Quality Semantic Web Data: Detecting Abnormal Data on the Semantic Web

Author: Yu Yang
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Extracting and Cleaning RDF Data

Author: Farid Mina
Publication venue: 'University of Waterloo'
Publication date: 01/05/2020
Field of study

The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs. This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need. The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics. This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data

University of Waterloo's Institutional Repository

Catching Numeric Inconsistencies in Graphs

Author: Fan Wenfei
Liu Xueli
Lu Ping
Tian Chao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2018
Field of study

Edinburgh Research Explorer

Big Graph Analyses: From Queries to Dependencies and Association Rules

Author: Fan Wenfei
Hu Chunming
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Springer - Publisher Connector

Edinburgh Research Explorer

Fusing Automatically Extracted Annotations for the Semantic Web

Author: Nikolov Andriy
Publication venue
Publication date: 01/01/2010
Field of study

This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

CiteSeerX

Open Research Online (The Open University)

OpenGrey Repository

Parallel Reasoning of Graph Functional Dependencies

Author: Cao Yingjie
Fan Wenfei
Liu Xueli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/10/2018
Field of study