1,204 research outputs found
Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality
We survey diverse approaches to the notion of information: from Shannon
entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov
complexity are presented: randomness and classification. The survey is divided
in two parts published in a same volume. Part II is dedicated to the relation
between logic and information system, within the scope of Kolmogorov
algorithmic information theory. We present a recent application of Kolmogorov
complexity: classification using compression, an idea with provocative
implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses
how Kolmogorov complexity, besides being a foundation to randomness, is also
related to classification. Another approach to classification is also
considered: the so-called "Google classification". It uses another original and
attractive idea which is connected to the classification using compression and
to Kolmogorov complexity from a conceptual point of view. We present and unify
these different approaches to classification in terms of Bottom-Up versus
Top-Down operational modes, of which we point the fundamental principles and
the underlying duality. We look at the way these two dual modes are used in
different approaches to information system, particularly the relational model
for database introduced by Codd in the 70's. This allows to point out diverse
forms of a fundamental duality. These operational modes are also reinterpreted
in the context of the comprehension schema of axiomatic set theory ZF. This
leads us to develop how Kolmogorov's complexity is linked to intensionality,
abstraction, classification and information system.Comment: 43 page
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
A SAT-based System for Consistent Query Answering
An inconsistent database is a database that violates one or more integrity
constraints, such as functional dependencies. Consistent Query Answering is a
rigorous and principled approach to the semantics of queries posed against
inconsistent databases. The consistent answers to a query on an inconsistent
database is the intersection of the answers to the query on every repair, i.e.,
on every consistent database that differs from the given inconsistent one in a
minimal way. Computing the consistent answers of a fixed conjunctive query on a
given inconsistent database can be a coNP-hard problem, even though every fixed
conjunctive query is efficiently computable on a given consistent database.
We designed, implemented, and evaluated CAvSAT, a SAT-based system for
consistent query answering. CAvSAT leverages a set of natural reductions from
the complement of consistent query answering to SAT and to Weighted MaxSAT. The
system is capable of handling unions of conjunctive queries and arbitrary
denial constraints, which include functional dependencies as a special case. We
report results from experiments evaluating CAvSAT on both synthetic and
real-world databases. These results provide evidence that a SAT-based approach
can give rise to a comprehensive and scalable system for consistent query
answering.Comment: 25 pages including appendix, to appear in the 22nd International
Conference on Theory and Applications of Satisfiability Testin
Eliminating Recursion from Monadic Datalog Programs on Trees
We study the problem of eliminating recursion from monadic datalog programs
on trees with an infinite set of labels. We show that the boundedness problem,
i.e., determining whether a datalog program is equivalent to some nonrecursive
one is undecidable but the decidability is regained if the descendant relation
is disallowed. Under similar restrictions we obtain decidability of the problem
of equivalence to a given nonrecursive program. We investigate the connection
between these two problems in more detail
Normalized Web Distance and Word Similarity
There is a great deal of work in cognitive psychology, linguistics, and
computer science, about using word (or phrase) frequencies in context in text
corpora to develop measures for word similarity or word association, going back
to at least the 1960s. The goal of this chapter is to introduce the
normalizedis a general way to tap the amorphous low-grade knowledge available
for free on the Internet, typed in by local users aiming at personal
gratification of diverse objectives, and yet globally achieving what is
effectively the largest semantic electronic database in the world. Moreover,
this database is available for all by using any search engine that can return
aggregate page-count estimates for a large range of search-queries. In the
paper introducing the NWD it was called `normalized Google distance (NGD),' but
since Google doesn't allow computer searches anymore, we opt for the more
neutral and descriptive NWD. web distance (NWD) method to determine similarity
between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural
Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau
Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN
978-142008592
Instance-Independent View Serializability for Semistructured Databases
Semistructured databases require tailor-made concurrency control mechanisms
since traditional solutions for the relational model have been shown to be
inadequate. Such mechanisms need to take full advantage of the hierarchical
structure of semistructured data, for instance allowing concurrent updates of
subtrees of, or even individual elements in, XML documents. We present an
approach for concurrency control which is document-independent in the sense
that two schedules of semistructured transactions are considered equivalent if
they are equivalent on all possible documents. We prove that it is decidable in
polynomial time whether two given schedules in this framework are equivalent.
This also solves the view serializability for semistructured schedules
polynomially in the size of the schedule and exponentially in the number of
transactions
Fixed-parameter tractability, definability, and model checking
In this article, we study parameterized complexity theory from the
perspective of logic, or more specifically, descriptive complexity theory.
We propose to consider parameterized model-checking problems for various
fragments of first-order logic as generic parameterized problems and show how
this approach can be useful in studying both fixed-parameter tractability and
intractability. For example, we establish the equivalence between the
model-checking for existential first-order logic, the homomorphism problem for
relational structures, and the substructure isomorphism problem. Our main
tractability result shows that model-checking for first-order formulas is
fixed-parameter tractable when restricted to a class of input structures with
an excluded minor. On the intractability side, for every t >= 0 we prove an
equivalence between model-checking for first-order formulas with t quantifier
alternations and the parameterized halting problem for alternating Turing
machines with t alternations. We discuss the close connection between this
alternation hierarchy and Downey and Fellows' W-hierarchy.
On a more abstract level, we consider two forms of definability, called Fagin
definability and slicewise definability, that are appropriate for describing
parameterized problems. We give a characterization of the class FPT of all
fixed-parameter tractable problems in terms of slicewise definability in finite
variable least fixed-point logic, which is reminiscent of the Immerman-Vardi
Theorem characterizing the class PTIME in terms of definability in least
fixed-point logic.Comment: To appear in SIAM Journal on Computin
Relational databases and homogeneity in logics with counting
We define a new hierarchy in the class of computable queries to relational databases, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables with the addition of counting quantifiers (Ck). We prove that the hierarchy is strict, and it turns out that it is orthogonal to the TIME-SPACE hierarchy defined with respect to the Turing machine complexity. We introduce a model of computation of queries to characterize the different layers of our hierarchy which is based on the reflective relational machine of S. Abiteboul, C. Papadimitriou, and V. Vianu. In our model the databases are represented by their Ck theories. Then we define and study several properties of databases related to homogeneity in Ck getting various results on the change in the computation power of the introduced machine, when working on classes of databases with such properties. We study the relation between our hierarchy and a similar one which we defined in a previous work, in terms of the preservation of equality of theories in fragments of first order logic with bounded number of variables, but without counting quantifiers (FOk). Finally, we give a characterization of the layers of the two hierarchies in terms of the infinitary logics CK∞ω and LK∞ω respectively
- …