5,392 research outputs found

    A revival of integrity constraints for data cleaning

    Get PDF
    Integrity constraints, a.k.a . data dependencies, are being widely used for improving the quality of schema . Recently constraints have enjoyed a revival for improving the quality of data . The tutorial aims to provide an overview of recent advances in constraint-based data cleaning. </jats:p

    Intelligent Application of Partial Repair for Handling Inconsistency among Database

    Get PDF
    Handling inconsistencies among standalone and integrated databases has been an important issue for database’s administrators for decades where nowadays databases are huge and not only different types of inconsistencies are ubiquitous in them but also application of any repair might induce new violations to integrity constraints. Resolved data may harm rest of database that leads to a costly process for repair of inconsistent data while after any resolution of data, database should be checked whether any new violation has emerged or not. Introducing partial repair through an approach to measure the tendency that a resolved portion of data incurs new violation would help any repair algorithm to isolate a selection of problematic data (not all), resolve it and save the database from being hurt during repair process. Partial repair keeps the rest of data from being affected that eliminates concerns over application of repair. Partial repair may not handle entire inconsistencies among databases but it represents a repair that would have minimum harm to rest of data along with consideration of cost which makes it valuable. Keywords: data quality, repair, inconsistency, dependenc

    The Consistency of Probabilistic Databases with Independent Cells

    Get PDF
    A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies

    Dimensional Inconsistency Measures and Postulates in Spatio-Temporal Databases

    Get PDF
    The problem of managing spatio-temporal data arises in many applications, such as location-based services, environmental monitoring, geographic information systems, and many others. Often spatio-temporal data arising from such applications turn out to be inconsistent, i.e., representing an impossible situation in the real world. Though several inconsistency measures have been proposed to quantify in a principled way inconsistency in propositional knowledge bases, little effort has been done so far on inconsistency measures tailored for the spatio-temporal setting.In this paper, we define and investigate new measures that are particularly suitable for dealing with inconsistent spatio-temporal information, because they explicitly take into account the spatial and temporal dimensions, as well as the dimension concerning the identifiers of the monitored objects. Specifically, we first define natural measures that look at individual dimensions (time, space, and objects), and then propose measures based on the notion of a repair. We then analyze their behavior w.r.t. common postulates defined for classical propositional knowledge bases, and find that the latter are not suitable for spatio-temporal databases, in that the proposed inconsistency measures do not often satisfy them. In light of this, we argue that also postulates should explicitly take into account the spatial, temporal, and object dimensions and thus define ?dimension-aware? counterparts of common postulates, which are indeed often satisfied by the new inconsistency measures. Finally, we study the complexity of the proposed inconsistency measures.Fil: Grant, John. Towson University; Estados UnidosFil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Molinaro, Cristian. Università della Calabria; ItaliaFil: Parisi, Francesco. Università della Calabria; Itali

    A Comparative Analysis of Novel Approach for Searching Inconsistent Data in Semantic Web

    Get PDF
    Resource Description Framework (RDF) has been generally utilized as a part of the Semantic Web to portray assets and their connections. The RDF chart is a standout among the most ordinarily utilized representations for RDF information. In any case, in numerous genuine applications, for example, the information extraction/joining, RDF charts incorporated from various information sources may frequently contain questionable and conflicting data (e.g., dubious names or that disregard truths/rules), because of the lack of quality of information sources. In this paper, it can formalizes the RDF information by conflicting probabilistic RDF charts, which contain both irregularities and vulnerability. With such a probabilistic diagram model, it concentrates on an essential issue, quality-mindful sub chart coordinating over conflicting probabilistic RDF diagrams (QA-g Match), which recovers sub diagrams from conflicting probabilistic RDF diagrams that are isomorphic to a given inquiry diagram and with great scores (considering both consistency and instability). Keeping in mind the end goal of proficiently answer QA-g Match questions, for that given two compelling pruning techniques, to be specific versatile name pruning and quality score pruning, which can extraordinarily sift through bogus alerts of sub diagrams. Likewise outline a successful list to encourage the proposed pruning strategies, and propose a proficient methodology for preparing QA-g Match questions. At long last, it exhibits the productivity and adequacy of proposed approaches through broad trials
    corecore