11 research outputs found

    Static Analysis of Partial Referential Integrity for Better Quality SQL Data

    Get PDF
    Referential integrity ensures the consistency of data between database relations. The SQL standard proposes different semantics to deal with partial information under referential integrity. Simple semantics neglects tuples with nulls, and enjoys built-in support by commercial database systems. Partial semantics does check tuples with nulls, but does not enjoy built-in support. We investigate this mismatch between the SQL standard and real database systems. Indeed, insight is gained into the trade-off between cleaner data under partial semantics and the efficiency of checking simple semantics. The cost for referential integrity checking is evaluated for various dataset sizes, indexing structures and degrees of cleanliness. While the cost of partial semantics exceeds that of simple semantics, their performance trends follow similar patterns under growing database sizes. Applying multiple index structures and exploiting appropriate validation mechanisms increase the efficiency of checking partial semantics

    Information sharing agents in a peer data exchange system

    Get PDF
    We present a semantics and answer set programs for relational peer data exchange systems. When a peer answers a query, it exchanges data with other peers in order to supplement or modify its own data source. The data exchange relationships between peers are specified by logical sentences called data exchange constraints and trust relationships, which together determine how data is moved around (in order to keep them satisfied). This process determines virtual, alternative instances for a peer that can be specified as the models of an answer set program. The peer consistent answers to a query that are returned by a peer are those that are invariant under all these instances. The logic program can be used to compute peer consistent answers

    From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back

    Get PDF
    In this work we establish and investigate connections between causes for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problems of computing actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition -inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.Comment: To appear in Theory of Computing Systems. By invitation to special issue with extended papers from ICDT 2015 (paper arXiv:1412.4311

    Active Integrity Constraints and Revision Programming

    Full text link
    We study active integrity constraints and revision programming, two formalisms designed to describe integrity constraints on databases and to specify policies on preferred ways to enforce them. Unlike other more commonly accepted approaches, these two formalisms attempt to provide a declarative solution to the problem. However, the original semantics of founded repairs for active integrity constraints and justified revisions for revision programs differ. Our main goal is to establish a comprehensive framework of semantics for active integrity constraints, to find a parallel framework for revision programs, and to relate the two. By doing so, we demonstrate that the two formalisms proposed independently of each other and based on different intuitions when viewed within a broader semantic framework turn out to be notational variants of each other. That lends support to the adequacy of the semantics we develop for each of the formalisms as the foundation for a declarative approach to the problem of database update and repair. In the paper we also study computational properties of the semantics we consider and establish results concerned with the concept of the minimality of change and the invariance under the shifting transformation.Comment: 48 pages, 3 figure

    Semantically Correct Query Answers in the Presence of Null Values

    No full text
    Abstract. For several reasons a database may not satisfy a given set of integrity constraints (ICs), but most likely most of the information in it is still consistent with those ICs; and could be retrieved when queries are answered. Consistent answers to queries wrt a set of ICs have been characterized as answers that can be obtained from every possible minimally repaired consistent version of the original database. In this paper we consider databases that contain null values and are also repaired, if necessary, using null values. For this purpose, we propose first a precise semantics for IC satisfaction in a database with null values that is compatible with the way null values are treated in commercial database management systems. Next, a precise notion of repair is introduced that privileges the introduction of null values when repairing foreign key constraints, in such a way that these new values do not create an infinite cycle of new inconsistencies. Finally, we analyze how to specify this kind of repairs of a database that contains null values using disjunctive logic programs with stable model semantics.

    Processing Uncertain RFID Data in Traceability Supply Chains

    Get PDF
    Radio Frequency Identification (RFID) is widely used to track and trace objects in traceability supply chains. However, massive uncertain data produced by RFID readers are not effective and efficient to be used in RFID application systems. Following the analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. We adjust different smoothing windows according to different rates of uncertain data, employ different strategies to process uncertain readings, and distinguish ghost, missing, and incomplete data according to their apparent positions. We propose a comprehensive data model which is suitable for different application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequence, the position, and the time intervals. The scheme is suitable for cyclic or long paths. Moreover, we further propose a processing algorithm for group and independent objects. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries

    Dealing with inconsistent and incomplete data in a semantic technology setting

    Get PDF
    Semantic and traditional databases are vulnerable to Inconsistent or Incomplete Data (IID). A data set stored in a traditional or semantic database is queried to retrieve record(s) in a tabular format. Such retrieved records can consist of many rows where each row contains an object and the associated fields (columns). However, a large set of records retrieved from a noisy data set may be wrongly analysed. For example, a data analyst may ascribe inconsistent data as consistent or incomplete data as complete where he did not identify the inconsistency or incompleteness in the data. Analysis on a large set of data can be undermined by the presence of IID in that data set. Reliance as a result is placed on the data analyst to identify and visualise the IID in the data set. The IID issues are heightened in open world assumptions as evident in semantic or Resource Description Framework (RDF) databases. Unlike the closed world assumption in traditional databases where data are assumed to be complete with its own issues, in the open world assumption the data might be assumed to be unknown and IID has to be tolerated at the outset. Formal Concept Analysis (FCA) can be used to deal with IID in such databases. That is because FCA is a mathematical method that uses a lattice structure to reveal the associations among objects and attributes in a data set. The existing FCA approaches that can be used in dealing with IID in RDF databases include fault tolerance, Dau's approach, and CUBIST approaches. The new FCA approaches include association rules, semi-automated and automated methods in FcaBedrock. These new FCA approaches were developed in the course of this study. To underpin this work, a series of empirical studies were carried out based on the single case study methodology. The case study, namely the Edinburgh Mouse Atlas Gene Expression Database (EMAGE) provided the real-life context according to that methodology. The existing and the new FCA approaches were used in identifying and visualising the IID in the EMAGE RDF data set. The empirical studies revealed that the existing approaches used in dealing with IID in EMAGE are tedious and do not allow the IID to be easily visualised in the database. It also revealed that existing FCA approaches for dealing with IID do not exclusively visualise the IID in a data set. This is unlike the new FCA approaches, notably the semi-automated and automated FcaBedrock that can separate out and thus exclusively visualise IID in objects associated with the many value attributes that characterise such data sets. The exclusive visualisation of IID in a data set enables the data analyst to identify holistically the IID in his or her investigated data set thereby avoiding mistaken conclusions. The aim was to discover how effective each FCA approach is in identifying and visualising IID, answering the research question: "How can FCA tools and techniques be used in identifying and visualising IID in RDF data?" The automated FcaBedrock approach emerged to be the best means for visually identifying IID in an RDF data set. The CUBIST approaches and the semi-automated approach were ranked as 2nd and 3rd, respectively, whilst Dau's approach ranked as 4th. Whilst the subject of IID in a semantic technology setting could be explored further, it can be concluded that the automated FcaBedrock approach best identifies and visualises the IID in an RDF thus semantic data set
    corecore