1,204 research outputs found

    Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality

    Get PDF
    We survey diverse approaches to the notion of information: from Shannon entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov complexity are presented: randomness and classification. The survey is divided in two parts published in a same volume. Part II is dedicated to the relation between logic and information system, within the scope of Kolmogorov algorithmic information theory. We present a recent application of Kolmogorov complexity: classification using compression, an idea with provocative implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses how Kolmogorov complexity, besides being a foundation to randomness, is also related to classification. Another approach to classification is also considered: the so-called "Google classification". It uses another original and attractive idea which is connected to the classification using compression and to Kolmogorov complexity from a conceptual point of view. We present and unify these different approaches to classification in terms of Bottom-Up versus Top-Down operational modes, of which we point the fundamental principles and the underlying duality. We look at the way these two dual modes are used in different approaches to information system, particularly the relational model for database introduced by Codd in the 70's. This allows to point out diverse forms of a fundamental duality. These operational modes are also reinterpreted in the context of the comprehension schema of axiomatic set theory ZF. This leads us to develop how Kolmogorov's complexity is linked to intensionality, abstraction, classification and information system.Comment: 43 page

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    A SAT-based System for Consistent Query Answering

    Full text link
    An inconsistent database is a database that violates one or more integrity constraints, such as functional dependencies. Consistent Query Answering is a rigorous and principled approach to the semantics of queries posed against inconsistent databases. The consistent answers to a query on an inconsistent database is the intersection of the answers to the query on every repair, i.e., on every consistent database that differs from the given inconsistent one in a minimal way. Computing the consistent answers of a fixed conjunctive query on a given inconsistent database can be a coNP-hard problem, even though every fixed conjunctive query is efficiently computable on a given consistent database. We designed, implemented, and evaluated CAvSAT, a SAT-based system for consistent query answering. CAvSAT leverages a set of natural reductions from the complement of consistent query answering to SAT and to Weighted MaxSAT. The system is capable of handling unions of conjunctive queries and arbitrary denial constraints, which include functional dependencies as a special case. We report results from experiments evaluating CAvSAT on both synthetic and real-world databases. These results provide evidence that a SAT-based approach can give rise to a comprehensive and scalable system for consistent query answering.Comment: 25 pages including appendix, to appear in the 22nd International Conference on Theory and Applications of Satisfiability Testin

    Eliminating Recursion from Monadic Datalog Programs on Trees

    Full text link
    We study the problem of eliminating recursion from monadic datalog programs on trees with an infinite set of labels. We show that the boundedness problem, i.e., determining whether a datalog program is equivalent to some nonrecursive one is undecidable but the decidability is regained if the descendant relation is disallowed. Under similar restrictions we obtain decidability of the problem of equivalence to a given nonrecursive program. We investigate the connection between these two problems in more detail

    Normalized Web Distance and Word Similarity

    Get PDF
    There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

    Instance-Independent View Serializability for Semistructured Databases

    Get PDF
    Semistructured databases require tailor-made concurrency control mechanisms since traditional solutions for the relational model have been shown to be inadequate. Such mechanisms need to take full advantage of the hierarchical structure of semistructured data, for instance allowing concurrent updates of subtrees of, or even individual elements in, XML documents. We present an approach for concurrency control which is document-independent in the sense that two schedules of semistructured transactions are considered equivalent if they are equivalent on all possible documents. We prove that it is decidable in polynomial time whether two given schedules in this framework are equivalent. This also solves the view serializability for semistructured schedules polynomially in the size of the schedule and exponentially in the number of transactions

    Fixed-parameter tractability, definability, and model checking

    Full text link
    In this article, we study parameterized complexity theory from the perspective of logic, or more specifically, descriptive complexity theory. We propose to consider parameterized model-checking problems for various fragments of first-order logic as generic parameterized problems and show how this approach can be useful in studying both fixed-parameter tractability and intractability. For example, we establish the equivalence between the model-checking for existential first-order logic, the homomorphism problem for relational structures, and the substructure isomorphism problem. Our main tractability result shows that model-checking for first-order formulas is fixed-parameter tractable when restricted to a class of input structures with an excluded minor. On the intractability side, for every t >= 0 we prove an equivalence between model-checking for first-order formulas with t quantifier alternations and the parameterized halting problem for alternating Turing machines with t alternations. We discuss the close connection between this alternation hierarchy and Downey and Fellows' W-hierarchy. On a more abstract level, we consider two forms of definability, called Fagin definability and slicewise definability, that are appropriate for describing parameterized problems. We give a characterization of the class FPT of all fixed-parameter tractable problems in terms of slicewise definability in finite variable least fixed-point logic, which is reminiscent of the Immerman-Vardi Theorem characterizing the class PTIME in terms of definability in least fixed-point logic.Comment: To appear in SIAM Journal on Computin

    Relational databases and homogeneity in logics with counting

    Get PDF
    We define a new hierarchy in the class of computable queries to relational databases, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables with the addition of counting quantifiers (Ck). We prove that the hierarchy is strict, and it turns out that it is orthogonal to the TIME-SPACE hierarchy defined with respect to the Turing machine complexity. We introduce a model of computation of queries to characterize the different layers of our hierarchy which is based on the reflective relational machine of S. Abiteboul, C. Papadimitriou, and V. Vianu. In our model the databases are represented by their Ck theories. Then we define and study several properties of databases related to homogeneity in Ck getting various results on the change in the computation power of the introduced machine, when working on classes of databases with such properties. We study the relation between our hierarchy and a similar one which we defined in a previous work, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables, but without counting quantifiers (FOk). Finally, we give a characterization of the layers of the two hierarchies in terms of the infinitary logics CK∞ω and LK∞ω respectively
    corecore