Search CORE

99 research outputs found

Decision Problems in Information Theory

Author: Abo Khamis Mahmoud
Kolaitis Phokion G.
Ngo Hung Q.
Suciu Dan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in ???. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in ???, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Author: Bezerianos Anastasia
Echihabi Karima
Gogolou Anna
Palpanas Themis
Tsandilas Theophanis
Publication venue
Publication date: 26/12/2022
Field of study

Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022)

arXiv.org e-Print Archive

Graphical Conjunctive Queries

Author: Bonchi Filippo
Seeber Jens
Sobocinski Pawel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th EACSL Annual Conference on Computer Science Logic (CSL 2018)
Publication date: 01/01/2018
Field of study

The Calculus of Conjunctive Queries (CCQ) has foundational status in database theory. A celebrated theorem of Chandra and Merlin states that CCQ query inclusion is decidable. Its proof transforms logical formulas to graphs: each query has a natural model - a kind of graph - and query inclusion reduces to the existence of a graph homomorphism between natural models. We introduce the diagrammatic language Graphical Conjunctive Queries (GCQ) and show that it has the same expressivity as CCQ. GCQ terms are string diagrams, and their algebraic structure allows us to derive a sound and complete axiomatisation of query inclusion, which turns out to be exactly Carboni and Walters\u27 notion of cartesian bicategory of relations. Our completeness proof exploits the combinatorial nature of string diagrams as (certain cospans of) hypergraphs: Chandra and Merlin\u27s insights inspire a theorem that relates such cospans with spans. Completeness and decidability of the (in)equational theory of GCQ follow as a corollary. Categorically speaking, our contribution is a model-theoretic completeness theorem of free cartesian bicategories (on a relational signature) for the category of sets and relations

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Author: C. Fowlkes
D. Achlioptas
D. Mavroeidis
D.A. Spielman
F. Fouss
H. Qiu
I. Koutis
L. Wang
P.G. Doyle
U. von Luxburg
W.Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to

O(n^3)

and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

arXiv.org e-Print Archive

Crossref

Implementation and applications of query interfaces to constraint databases in a distributed computing environment

Author: Chang Chin-Chih
Publication venue
Publication date: 01/12/2000
Field of study

SHAREOK repository