Search CORE

1,085 research outputs found

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Author: Amarilli Antoine
Amsterdamer Yael
Milo Tova
Publication venue
Publication date: 16/12/2013
Field of study

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

arXiv.org e-Print Archive

CiteSeerX

Diamond-free Families

Author: Jerrold R. Griggs
Linyuan Lu
Series A
Wei-tian Li
Publication venue
Publication date: 01/01/2011
Field of study

Given a finite poset P, we consider the largest size La(n,P) of a family of subsets of

[n]:=\{1,...,n\}

that contains no subposet P. This problem has been studied intensively in recent years, and it is conjectured that

\pi(P):= \lim_{n\rightarrow\infty} La(n,P)/{n choose n/2}

exists for general posets P, and, moreover, it is an integer. For

k\ge2

let \D_k denote the

k

-diamond poset

\{A< B_1,...,B_k < C\}

. We study the average number of times a random full chain meets a

P

-free family, called the Lubell function, and use it for P=\D_k to determine \pi(\D_k) for infinitely many values

k

. A stubborn open problem is to show that \pi(\D_2)=2; here we make progress by proving \pi(\D_2)\le 2 3/11 (if it exists).Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Elementary bounds on Poincare and log-Sobolev constants for decomposable Markov chains

Author: Jerrum Mark
Son Jung-Bae
Tetali Prasad
Vigoda Eric
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/03/2005
Field of study

We consider finite-state Markov chains that can be naturally decomposed into smaller ``projection'' and ``restriction'' chains. Possibly this decomposition will be inductive, in that the restriction chains will be smaller copies of the initial chain. We provide expressions for Poincare (resp. log-Sobolev) constants of the initial Markov chain in terms of Poincare (resp. log-Sobolev) constants of the projection and restriction chains, together with further a parameter. In the case of the Poincare constant, our bound is always at least as good as existing ones and, depending on the value of the extra parameter, may be much better. There appears to be no previously published decomposition result for the log-Sobolev constant. Our proofs are elementary and self-contained.Comment: Published at http://dx.doi.org/10.1214/105051604000000639 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Quantitative Static Analysis of Communication Protocols using Abstract Markov Chains

Author: Miné Antoine
Ouadjaout Abdelraouf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2019
Field of study

International audienceIn this paper we present a static analysis of probabilistic programs to quantify their performance properties by taking into account both the stochastic aspects of the language and those related to the execution environment. More particularly, we are interested in the analysis of communication protocols in lossy networks and we aim at inferring statically parametric bounds of some important metrics such as the expectation of the throughput or the energy consumption. Our analysis is formalized within the theory of abstract interpretation and soundly takes all possible executions into account. We model the concrete executions as a set of Markov chains and we introduce a novel notion of abstract Markov chains that provides a finite and symbolic representation to over-approximate the (possi-bly unbounded) set of concrete behaviors. We show that our proposed formalism is expressive enough to handle both probabilistic and pure non-deterministic choices within the same semantics. Our analysis operates in two steps. The first step is a classic abstract interpretation of the source code, using stock numerical abstract domains and a specific automata domain, in order to extract the abstract Markov chain of the program. The second step extracts from this chain particular invari-ants about the stationary distribution and computes its symbolic bounds using a parametric Fourier-Motzkin elimination algorithm. We present a prototype implementation of the analysis and we discuss some preliminary experiments on a number of communication protocols. We compare our prototype to the state-of-the-art probabilistic model checker Prism and we highlight the advantages and shortcomings of both approaches

Colouring set families without monochromatic k-chains

Author: Das Shagnik
Glebov Roman
Sudakov Benny
Tran Tuan
Publication venue: 'Elsevier BV'
Publication date: 08/06/2019
Field of study

A coloured version of classic extremal problems dates back to Erd\H{o}s and Rothschild, who in 1974 asked which

n

-vertex graph has the maximum number of 2-edge-colourings without monochromatic triangles. They conjectured that the answer is simply given by the largest triangle-free graph. Since then, this new class of coloured extremal problems has been extensively studied by various researchers. In this paper we pursue the Erd\H{o}s--Rothschild versions of Sperner's Theorem, the classic result in extremal set theory on the size of the largest antichain in the Boolean lattice, and Erd\H{o}s' extension to

k

-chain-free families. Given a family

\mathcal{F}

of subsets of

[n]

, we define an

(r,k)

-colouring of

\mathcal{F}

to be an

r

-colouring of the sets without any monochromatic

k

-chains

F_1 \subset F_2 \subset \dots \subset F_k

. We prove that for

n

sufficiently large in terms of

k

, the largest

k

-chain-free families also maximise the number of

(2,k)

-colourings. We also show that the middle level,

\binom{[n]}{\lfloor n/2 \rfloor}

, maximises the number of

(3,2)

-colourings, and give asymptotic results on the maximum possible number of

(r,k)

-colourings whenever

r(k-1)

is divisible by three.Comment: 30 pages, final versio

arXiv.org e-Print Archive

Repository for Publications and Research Data

Poset Ramsey number $R(P,Q_n)$ . III. N-shaped poset

Author: Axenovich Maria
Winter Christian
Publication venue
Publication date: 04/11/2022
Field of study

Given partially ordered sets (posets)

(P, \leq_P)

and

(P', \leq_{P'})

, we say that

P'

contains a copy of

P

if for some injective function

f\colon P\rightarrow P'

and for any

A, B\in P

A\leq _P B

if and only if

f(A)\leq_{P'} f(B)

. For any posets

P

and

Q

, the poset Ramsey number

R(P,Q)

is the least positive integer

N

such that no matter how the elements of an

N

-dimensional Boolean lattice are colored in blue and red, there is either a copy of

P

with all blue elements or a copy of

Q

with all red elements. We focus on the poset Ramsey number

R(P, Q_n)

for a fixed poset

P

and an

n

-dimensional Boolean lattice

Q_n

, as

n

grows large. It is known that

n+c_1(P) \leq R(P,Q_n) \leq c_2(P) n

, for positive constants

c_1

and

c_2

. However, there is no poset

P

known, for which

R(P, Q_n)> (1+\epsilon)n

, for

\epsilon >0

. This paper is devoted to a new method for finding upper bounds on

R(P, Q_n)

using a duality between copies of

Q_n

and sets of elements that cover them, referred to as blockers. We prove several properties of blockers and their direct relation to the Ramsey numbers. Using these properties we show that

R(\mathcal{N},Q_n)=n+\Theta(n/\log n)

, for a poset

\mathcal{N}

with four elements

A, B, C,

and

D

, such that

A<C

B<D

B<C

, and the remaining pairs of elements are incomparable.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive