Search CORE

1,204 research outputs found

Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality

Author: Ferbus-Zanda Marie
Publication venue
Publication date: 01/01/2010
Field of study

We survey diverse approaches to the notion of information: from Shannon entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov complexity are presented: randomness and classification. The survey is divided in two parts published in a same volume. Part II is dedicated to the relation between logic and information system, within the scope of Kolmogorov algorithmic information theory. We present a recent application of Kolmogorov complexity: classification using compression, an idea with provocative implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses how Kolmogorov complexity, besides being a foundation to randomness, is also related to classification. Another approach to classification is also considered: the so-called "Google classification". It uses another original and attractive idea which is connected to the classification using compression and to Kolmogorov complexity from a conceptual point of view. We present and unify these different approaches to classification in terms of Bottom-Up versus Top-Down operational modes, of which we point the fundamental principles and the underlying duality. We look at the way these two dual modes are used in different approaches to information system, particularly the relational model for database introduced by Codd in the 70's. This allows to point out diverse forms of a fundamental duality. These operational modes are also reinterpreted in the context of the comprehension schema of axiomatic set theory ZF. This leads us to develop how Kolmogorov's complexity is linked to intensionality, abstraction, classification and information system.Comment: 43 page

arXiv.org e-Print Archive

Hal-Diderot

bdbms -- A Database Management System for Biological Data

Author: Aref Walid G.
Eltabakh Mohamed Y.
Ouzzani Mourad
Publication venue
Publication date: 01/12/2006
Field of study

Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

Purdue E-Pubs

A SAT-based System for Consistent Query Answering

Author: A Fuxman
Akhil A. Dixit
D Lembo
G Greco
IF Ilyas
J Chomicki
J Davies
J Wijsen
J Wijsen
Leopoldo Bertossi
M Arenas
MC Marileo
P Koutris
P Koutris
Pablo Barceló
PG Kolaitis
PG Kolaitis
T Rekatsinas
Publication venue
Publication date: 07/05/2019
Field of study

An inconsistent database is a database that violates one or more integrity constraints, such as functional dependencies. Consistent Query Answering is a rigorous and principled approach to the semantics of queries posed against inconsistent databases. The consistent answers to a query on an inconsistent database is the intersection of the answers to the query on every repair, i.e., on every consistent database that differs from the given inconsistent one in a minimal way. Computing the consistent answers of a fixed conjunctive query on a given inconsistent database can be a coNP-hard problem, even though every fixed conjunctive query is efficiently computable on a given consistent database. We designed, implemented, and evaluated CAvSAT, a SAT-based system for consistent query answering. CAvSAT leverages a set of natural reductions from the complement of consistent query answering to SAT and to Weighted MaxSAT. The system is capable of handling unions of conjunctive queries and arbitrary denial constraints, which include functional dependencies as a special case. We report results from experiments evaluating CAvSAT on both synthetic and real-world databases. These results provide evidence that a SAT-based approach can give rise to a comprehensive and scalable system for consistent query answering.Comment: 25 pages including appendix, to appear in the 22nd International Conference on Theory and Applications of Satisfiability Testin

arXiv.org e-Print Archive

Crossref

Eliminating Recursion from Monadic Datalog Programs on Trees

Author: D Calvanese
Filip Mazowiecki
G Hillebrand
H Gaifman
Henrik Björklund
J Naughton
J Naughton
Michael Benedikt
Mikołaj Bojańczyk
O Shmueli
S Abiteboul
S Ceri
Y Sagiv
Publication venue
Publication date: 10/05/2015
Field of study

We study the problem of eliminating recursion from monadic datalog programs on trees with an infinite set of labels. We show that the boundedness problem, i.e., determining whether a datalog program is equivalent to some nonrecursive one is undecidable but the decidability is regained if the descendant relation is disallowed. Under similar restrictions we obtain decidability of the problem of equivalence to a given nonrecursive program. We investigate the connection between these two problems in more detail

arXiv.org e-Print Archive

Crossref

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Instance-Independent View Serializability for Semistructured Databases

Author: Dekeyser Stijn
Hidders Jan
Paredaens Jan
Vercammen Roel
Publication venue
Publication date: 26/05/2005
Field of study

Semistructured databases require tailor-made concurrency control mechanisms since traditional solutions for the relational model have been shown to be inadequate. Such mechanisms need to take full advantage of the hierarchical structure of semistructured data, for instance allowing concurrent updates of subtrees of, or even individual elements in, XML documents. We present an approach for concurrency control which is document-independent in the sense that two schedules of semistructured transactions are considered equivalent if they are equivalent on all possible documents. We prove that it is decidable in polynomial time whether two given schedules in this framework are equivalent. This also solves the view serializability for semistructured schedules polynomially in the size of the schedule and exponentially in the number of transactions

arXiv.org e-Print Archive

University of Southern Queensland ePrints

Fixed-parameter tractability, definability, and model checking

Author: Flum Joerg
Grohe Martin
Publication venue
Publication date: 01/01/2001
Field of study

In this article, we study parameterized complexity theory from the perspective of logic, or more specifically, descriptive complexity theory. We propose to consider parameterized model-checking problems for various fragments of first-order logic as generic parameterized problems and show how this approach can be useful in studying both fixed-parameter tractability and intractability. For example, we establish the equivalence between the model-checking for existential first-order logic, the homomorphism problem for relational structures, and the substructure isomorphism problem. Our main tractability result shows that model-checking for first-order formulas is fixed-parameter tractable when restricted to a class of input structures with an excluded minor. On the intractability side, for every t >= 0 we prove an equivalence between model-checking for first-order formulas with t quantifier alternations and the parameterized halting problem for alternating Turing machines with t alternations. We discuss the close connection between this alternation hierarchy and Downey and Fellows' W-hierarchy. On a more abstract level, we consider two forms of definability, called Fagin definability and slicewise definability, that are appropriate for describing parameterized problems. We give a characterization of the class FPT of all fixed-parameter tractable problems in terms of slicewise definability in finite variable least fixed-point logic, which is reminiscent of the Immerman-Vardi Theorem characterizing the class PTIME in terms of definability in least fixed-point logic.Comment: To appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Relational databases and homogeneity in logics with counting

Author: Turull Torres José María
Publication venue
Publication date: 01/01/2006
Field of study

We define a new hierarchy in the class of computable queries to relational databases, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables with the addition of counting quantifiers (Ck). We prove that the hierarchy is strict, and it turns out that it is orthogonal to the TIME-SPACE hierarchy defined with respect to the Turing machine complexity. We introduce a model of computation of queries to characterize the different layers of our hierarchy which is based on the reflective relational machine of S. Abiteboul, C. Papadimitriou, and V. Vianu. In our model the databases are represented by their Ck theories. Then we define and study several properties of databases related to homogeneity in Ck getting various results on the change in the computation power of the introduced machine, when working on classes of databases with such properties. We study the relation between our hierarchy and a similar one which we defined in a previous work, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables, but without counting quantifiers (FOk). Finally, we give a characterization of the layers of the two hierarchies in terms of the infinitary logics CK∞ω and LK∞ω respectively

University of Szeged