1,085 research outputs found

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

    Diamond-free Families

    Get PDF
    Given a finite poset P, we consider the largest size La(n,P) of a family of subsets of [n]:={1,...,n}[n]:=\{1,...,n\} that contains no subposet P. This problem has been studied intensively in recent years, and it is conjectured that π(P):=limnLa(n,P)/nchoosen/2\pi(P):= \lim_{n\rightarrow\infty} La(n,P)/{n choose n/2} exists for general posets P, and, moreover, it is an integer. For k2k\ge2 let \D_k denote the kk-diamond poset {A<B1,...,Bk<C}\{A< B_1,...,B_k < C\}. We study the average number of times a random full chain meets a PP-free family, called the Lubell function, and use it for P=\D_k to determine \pi(\D_k) for infinitely many values kk. A stubborn open problem is to show that \pi(\D_2)=2; here we make progress by proving \pi(\D_2)\le 2 3/11 (if it exists).Comment: 16 page

    Elementary bounds on Poincare and log-Sobolev constants for decomposable Markov chains

    Full text link
    We consider finite-state Markov chains that can be naturally decomposed into smaller ``projection'' and ``restriction'' chains. Possibly this decomposition will be inductive, in that the restriction chains will be smaller copies of the initial chain. We provide expressions for Poincare (resp. log-Sobolev) constants of the initial Markov chain in terms of Poincare (resp. log-Sobolev) constants of the projection and restriction chains, together with further a parameter. In the case of the Poincare constant, our bound is always at least as good as existing ones and, depending on the value of the extra parameter, may be much better. There appears to be no previously published decomposition result for the log-Sobolev constant. Our proofs are elementary and self-contained.Comment: Published at http://dx.doi.org/10.1214/105051604000000639 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Quantitative Static Analysis of Communication Protocols using Abstract Markov Chains

    Get PDF
    International audienceIn this paper we present a static analysis of probabilistic programs to quantify their performance properties by taking into account both the stochastic aspects of the language and those related to the execution environment. More particularly, we are interested in the analysis of communication protocols in lossy networks and we aim at inferring statically parametric bounds of some important metrics such as the expectation of the throughput or the energy consumption. Our analysis is formalized within the theory of abstract interpretation and soundly takes all possible executions into account. We model the concrete executions as a set of Markov chains and we introduce a novel notion of abstract Markov chains that provides a finite and symbolic representation to over-approximate the (possi-bly unbounded) set of concrete behaviors. We show that our proposed formalism is expressive enough to handle both probabilistic and pure non-deterministic choices within the same semantics. Our analysis operates in two steps. The first step is a classic abstract interpretation of the source code, using stock numerical abstract domains and a specific automata domain, in order to extract the abstract Markov chain of the program. The second step extracts from this chain particular invari-ants about the stationary distribution and computes its symbolic bounds using a parametric Fourier-Motzkin elimination algorithm. We present a prototype implementation of the analysis and we discuss some preliminary experiments on a number of communication protocols. We compare our prototype to the state-of-the-art probabilistic model checker Prism and we highlight the advantages and shortcomings of both approaches

    Colouring set families without monochromatic k-chains

    Full text link
    A coloured version of classic extremal problems dates back to Erd\H{o}s and Rothschild, who in 1974 asked which nn-vertex graph has the maximum number of 2-edge-colourings without monochromatic triangles. They conjectured that the answer is simply given by the largest triangle-free graph. Since then, this new class of coloured extremal problems has been extensively studied by various researchers. In this paper we pursue the Erd\H{o}s--Rothschild versions of Sperner's Theorem, the classic result in extremal set theory on the size of the largest antichain in the Boolean lattice, and Erd\H{o}s' extension to kk-chain-free families. Given a family F\mathcal{F} of subsets of [n][n], we define an (r,k)(r,k)-colouring of F\mathcal{F} to be an rr-colouring of the sets without any monochromatic kk-chains F1F2FkF_1 \subset F_2 \subset \dots \subset F_k. We prove that for nn sufficiently large in terms of kk, the largest kk-chain-free families also maximise the number of (2,k)(2,k)-colourings. We also show that the middle level, ([n]n/2)\binom{[n]}{\lfloor n/2 \rfloor}, maximises the number of (3,2)(3,2)-colourings, and give asymptotic results on the maximum possible number of (r,k)(r,k)-colourings whenever r(k1)r(k-1) is divisible by three.Comment: 30 pages, final versio

    Poset Ramsey number R(P,Qn)R(P,Q_n). III. N-shaped poset

    Full text link
    Given partially ordered sets (posets) (P,P)(P, \leq_P) and (P,P)(P', \leq_{P'}), we say that PP' contains a copy of PP if for some injective function f ⁣:PPf\colon P\rightarrow P' and for any A,BPA, B\in P, APBA\leq _P B if and only if f(A)Pf(B)f(A)\leq_{P'} f(B). For any posets PP and QQ, the poset Ramsey number R(P,Q)R(P,Q) is the least positive integer NN such that no matter how the elements of an NN-dimensional Boolean lattice are colored in blue and red, there is either a copy of PP with all blue elements or a copy of QQ with all red elements. We focus on the poset Ramsey number R(P,Qn)R(P, Q_n) for a fixed poset PP and an nn-dimensional Boolean lattice QnQ_n, as nn grows large. It is known that n+c1(P)R(P,Qn)c2(P)nn+c_1(P) \leq R(P,Q_n) \leq c_2(P) n, for positive constants c1c_1 and c2c_2. However, there is no poset PP known, for which R(P,Qn)>(1+ϵ)nR(P, Q_n)> (1+\epsilon)n, for ϵ>0\epsilon >0. This paper is devoted to a new method for finding upper bounds on R(P,Qn)R(P, Q_n) using a duality between copies of QnQ_n and sets of elements that cover them, referred to as blockers. We prove several properties of blockers and their direct relation to the Ramsey numbers. Using these properties we show that R(N,Qn)=n+Θ(n/logn)R(\mathcal{N},Q_n)=n+\Theta(n/\log n), for a poset N\mathcal{N} with four elements A,B,C,A, B, C, and DD, such that A<CA<C, B<DB<D, B<CB<C, and the remaining pairs of elements are incomparable.Comment: 19 pages, 6 figure
    corecore