206,301 research outputs found

    A relational model for incomplete information in temporal databases

    Get PDF
    In temporal database systems the time varying aspects of data are captured by time-stamping data values. Research in temporal databases has concentrated on developing models in which it is essential that all the information be known;In the present work a relational model for incomplete information is presented. The model allows incomplete temporal information to be stored, and provides a powerful, yet simple, algebra to query the incomplete information;The incomplete information model presented here generalizes a well-known model for temporal databases with complete information. The algebraic expressions in the model produce results that are reliable in the sense that they never report incorrect information. This is shown by introducing the notion of completions of relations and databases. It is also shown that except for certain cases of selection, if the definition of the operators were strengthened to give more information, we could obtain results that are not reliable. This result is obtained by introducing the concepts of extensions of relations and more informative relations;Update operations create, change, and changekey are defined. These operations allow the user to modify the state of the database to reflect changes in the real world, to correct errors in the database, and to increase the information content of incomplete objects as more information becomes available

    Knowledge-preserving Certain Answers for SQL-like Queries

    Get PDF
    International audienceAnswering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for reallife database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for valueinventing queries on incomplete databases, addressing the key shortcomings of previous approaches

    Multimodal medical case retrieval using the Dezert-Smarandache theory.

    No full text
    International audienceMost medical images are now digitized and stored with semantic information, leading to medical case databases. They may be used for aid to diagnosis, by retrieving similar cases to those in examination. But the information are often incomplete, uncertain and sometimes conflicting, so difficult to use. In this paper, we present a Case Based Reasoning (CBR) system for medical case retrieval, derived from the Dezert-Smarandache theory, which is well suited to handle those problems. We introduce a case retrieval specific frame of discernment theta, which associates each element of theta with a case in the database; we take advantage of the flexibility offered by the DSmT's hybrid models to finely model the database. The system is designed so that heterogeneous sources of information can be integrated in the system: in particular images, indexed by their digital content, and symbolic information. The method is evaluated on two classified databases: one for diabetic retinopathy follow-up (DRD) and one for screening mammography (DDSM). On these databases, results are promising: the retrieval precision at five reaches 81.8% on DRD and 84.8% on DDSM

    A Semantics-Based Approach to Design of Query Languages for Partial Information

    Get PDF
    Most of work on partial information in databases asks which operations of standard languages, like relational algebra, can still be performed correctly in the presence of nulls. In this paper a different point of view is advocated. We believe that the semantics of partiality must be clearly understood and it should give us new design principles for languages for databases with partial information. There are different sources of partial information, such as missing information and conflicts that occur when different databases are merged. In this paper, we develop a common semantic framework for them which can be applied in a context more general than the flat relational model. This ordered semantics, which is based on ideas used in the semantics of programming languages, cleanly intergrates all kinds of partial information and serves as a tool to establish connections between them. Analyzing properties of semantic domains of types suitable for representing partial information, we come up with operations that are naturally associated with those types, and we organize programming syntax around these operations. We show how the languages that we obtain can be used to ask typical queries about incomplete information in relational databases, and how they can express some previously proposed languages. Finally, we discuss a few related topics such as mixing traditional constraints with partial information and extending semantics and languages to accommodate bags and recursive types

    On the accuracy of language trees

    Get PDF
    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.Comment: 36 pages, 14 figure

    An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases

    Get PDF
    abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as Query Processing over Incomplete Autonomous Databases (QPIAD) aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values--which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this thesis, I present a principled probabilis- tic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. I learn this distribution in terms of Bayes networks. My approach involves min- ing/"learning" Bayes networks from a sample of the database, and using it do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). I present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayes networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayes networks provide higher precision and recall than AFDs while keeping query processing costs manageable.Dissertation/ThesisM.S. Computer Science 201
    • …
    corecore