266 research outputs found
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Reasoning About Integrity Constraints for Tree-Structured Data
We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded
Validation Framework for RDF-based Constraint Languages
In this thesis, a validation framework is introduced that enables to consistently execute RDF-based constraint languages on RDF data and to formulate constraints of any type. The framework reduces the representation of constraints to the absolute minimum, is based on formal logics, consists of a small lightweight vocabulary, and ensures consistency regarding validation results and enables constraint transformations for each constraint type across RDF-based constraint languages
End-to-End Entity Resolution for Big Data: A Survey
One of the most important tasks for improving data quality and the
reliability of data analytics results is Entity Resolution (ER). ER aims to
identify different descriptions that refer to the same real-world entity, and
remains a challenging problem. While previous works have studied specific
aspects of ER (and mostly in traditional settings), in this survey, we provide
for the first time an end-to-end view of modern ER workflows, and of the novel
aspects of entity indexing and matching methods in order to cope with more than
one of the Big Data characteristics simultaneously. We present the basic
concepts, processing steps and execution strategies that have been proposed by
different communities, i.e., database, semantic Web and machine learning, in
order to cope with the loose structuredness, extreme diversity, high speed and
large scale of entity descriptions used by real-world applications. Finally, we
provide a synthetic discussion of the existing approaches, and conclude with a
detailed presentation of open research directions
- …