10,535 research outputs found

    Functional dependencies over XML documents with DTDs

    Get PDF
    In this article an axiomatisation for functional dependencies over XML documents is presented. The approach is based on a representation of XML document type definitions (or XML schemata) by nested attributes using constructors for records, disjoint unions and lists, and a particular null value, which covers optionality. Infinite structures that may result from referencing attributes in XML are captured by rational trees. Using a partial order on nested attributes we obtain non-distributive Brouwer algebras. The operations of the Brouwer algebra are exploited in the soundness and completeness proofs for derivation rules for functional dependencies

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.Comment: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.0601

    Keys and Armstrong databases in trees with restructuring

    Get PDF
    The definition of keys, antikeys, Armstrong-instances are extended to complex values in the presence of several constructors. These include tuple, list, set and a union constructor. Nested data structures are built using the various constructors in a tree-like fashion. The union constructor complicates all results and proofs significantly. The reason for this is that it comes along with non-trivial restructuring rules. Also, so-called counter attributes need to be introduced. It is shown that keys can be identified with closed sets of subattributes under a certain closure operator. Minimal keys correspond to closed sets minimal under set-wise containment. The existence of Armstrong databases for given minimal key systems is investigated. A sufficient condition is given and some necessary conditions are also exhibited. Weak keys can be obtained if functional dependency is replaced by weak functional dependency in the definition. It is shown, that this leads to the same concept. Strong keys are defined as principal ideals in the subattribute lattice. Characterization of antikeys for strong keys is given. Some numerical necessary conditions for the existence of Armstrong databases in case of degenerate keys are shown. This leads to the theory of bounded domain attributes. The complexity of the problem is shown through several examples

    Constraint solving over multi-valued logics - application to digital circuits

    Get PDF
    Due to usage conditions, hazardous environments or intentional causes, physical and virtual systems are subject to faults in their components, which may affect their overall behaviour. In a ‘black-box’ agent modelled by a set of propositional logic rules, in which just a subset of components is externally visible, such faults may only be recognised by examining some output function of the agent. A (fault-free) model of the agent’s system provides the expected output given some input. If the real output differs from that predicted output, then the system is faulty. However, some faults may only become apparent in the system output when appropriate inputs are given. A number of problems regarding both testing and diagnosis thus arise, such as testing a fault, testing the whole system, finding possible faults and differentiating them to locate the correct one. The corresponding optimisation problems of finding solutions that require minimum resources are also very relevant in industry, as is minimal diagnosis. In this dissertation we use a well established set of benchmark circuits to address such diagnostic related problems and propose and develop models with different logics that we formalise and generalise as much as possible. We also prove that all techniques generalise to agents and to multiple faults. The developed multi-valued logics extend the usual Boolean logic (suitable for faultfree models) by encoding values with some dependency (usually on faults). Such logics thus allow modelling an arbitrary number of diagnostic theories. Each problem is subsequently solved with CLP solvers that we implement and discuss, together with a new efficient search technique that we present. We compare our results with other approaches such as SAT (that require substantial duplication of circuits), showing the effectiveness of constraints over multi-valued logics, and also the adequacy of a general set constraint solver (with special inferences over set functions such as cardinality) on other problems. In addition, for an optimisation problem, we integrate local search with a constructive approach (branch-and-bound) using a variety of logics to improve an existing efficient tool based on SAT and ILP
    • …
    corecore