Search CORE

1,358 research outputs found

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Author: Amarilli Antoine
Amsterdamer Yael
Milo Tova
Publication venue
Publication date: 16/12/2013
Field of study

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

arXiv.org e-Print Archive

CiteSeerX

Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information

Author: Cohen Edith
Kaplan Haim
Publication venue
Publication date: 01/01/2011
Field of study

Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the sample facilitate fast approximate processing of queries posed over the original data and the value of the sample hinges on the quality of these estimators. Our work targets data sets such as request and traffic logs and sensor measurements, where data is repeatedly collected over multiple {\em instances}: time periods, locations, or snapshots. We are interested in queries that span multiple instances, such as distinct counts and distance measures over selected records. These queries are used for applications ranging from planning to anomaly and change detection. Unbiased low-variance estimators are particularly effective as the relative error decreases with the number of selected record keys. The Horvitz-Thompson estimator, known to minimize variance for sampling with "all or nothing" outcomes (which reveals exacts value or no information on estimated quantity), is not optimal for multi-instance operations for which an outcome may provide partial information. We present a general principled methodology for the derivation of (Pareto) optimal unbiased estimators over sampled instances and aim to understand its potential. We demonstrate significant improvement in estimate accuracy of fundamental queries for common sampling schemes.Comment: This is a full version of a PODS 2011 pape

arXiv.org e-Print Archive

CiteSeerX

Bohrification

Author: Heunen Chris
Landsman Nicolaas P.
Spitters Bas
Publication venue
Publication date: 01/01/2009
Field of study

New foundations for quantum logic and quantum spaces are constructed by merging algebraic quantum theory and topos theory. Interpreting Bohr's "doctrine of classical concepts" mathematically, given a quantum theory described by a noncommutative C*-algebra A, we construct a topos T(A), which contains the "Bohrification" B of A as an internal commutative C*-algebra. Then B has a spectrum, a locale internal to T(A), the external description S(A) of which we interpret as the "Bohrified" phase space of the physical system. As in classical physics, the open subsets of S(A) correspond to (atomic) propositions, so that the "Bohrified" quantum logic of A is given by the Heyting algebra structure of S(A). The key difference between this logic and its classical counterpart is that the former does not satisfy the law of the excluded middle, and hence is intuitionistic. When A contains sufficiently many projections (e.g. when A is a von Neumann algebra, or, more generally, a Rickart C*-algebra), the intuitionistic quantum logic S(A) of A may also be compared with the traditional quantum logic, i.e. the orthomodular lattice of projections in A. This time, the main difference is that the former is distributive (even when A is noncommutative), while the latter is not. This chapter is a streamlined synthesis of 0709.4364, 0902.3201, 0905.2275.Comment: 44 pages; a chapter of the first author's PhD thesis, to appear in "Deep Beauty" (ed. H. Halvorson

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

Pure OAI Repository

Edinburgh Research Explorer

Oxford University Research Archive

Radboud Repository

CERN Document Server

Complexity and Algorithms for the Discrete Fr\'echet Distance Upper Bound with Imprecise Input

Author: Fan Chenglin
Zhu Binhai
Publication venue
Publication date: 10/09/2015
Field of study

We study the problem of computing the upper bound of the discrete Fr\'{e}chet distance for imprecise input, and prove that the problem is NP-hard. This solves an open problem posed in 2010 by Ahn \emph{et al}. If shortcuts are allowed, we show that the upper bound of the discrete Fr\'{e}chet distance with shortcuts for imprecise input can be computed in polynomial time and we present several efficient algorithms.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX