Search CORE

287 research outputs found

On the Amount of Dependence in the Prime Factorization of a Uniform Random Integer

Author: Arratia Richard
Publication venue
Publication date: 04/05/2013
Field of study

How much dependence is there in the prime factorization of a random integer distributed uniformly from 1 to n? How much dependence is there in the decomposition into cycles of a random permutation of n points? What is the relation between the Poisson-Dirichlet process and the scale invariant Poisson process? These three questions have essentially the same answers, with respect to total variation distance, considering only small components, and with respect to a Wasserstein distance, considering all components. The Wasserstein distance is the expected number of changes -- insertions and deletions -- needed to change the dependent system into an independent system. In particular we show that for primes, roughly speaking, 2+o(1) changes are necessary and sufficient to convert a uniformly distributed random integer from 1 to n into a random integer prod_{p leq n} p^{Z_p} in which the multiplicity Z_p of the factor p is geometrically distributed, with all Z_p independent. The changes are, with probability tending to 1, one deletion, together with a random number of insertions, having expectation 1+o(1). The crucial tool for showing that 2+epsilon suffices is a coupling of the infinite independent model of prime multiplicities, with the scale invariant Poisson process on (0,infty). A corollary of this construction is the first metric bound on the distance to the Poisson-Dirichlet in Billingsley's 1972 weak convergence result. Our bound takes the form: there are couplings in which E sum |log P_i(n) - (log n) V_i | = O(\log \log n), where P_i denotes the i-th largest prime factor and V_i denotes the i-th component of the Poisson-Dirichlet process. It is reasonable to conjecture that O(1) is achievable.Comment: 46 pages, appeared in Contemporary Combinatorics, 29-91, Bolyai Soc. Math. Stud., 10, Janos Bolyai Math. Soc., Budapest, 200

arXiv.org e-Print Archive

Poisson--Dirichlet Limit Theorems in Combinatorial Applications via Multi-Intensities

Author: Arratia Richard
Kochman Fred
Publication venue
Publication date: 07/01/2014
Field of study

We present new, exceptionally efficient proofs of Poisson--Dirichlet limit theorems for the scaled sizes of irreducible components of random elements in the classic combinatorial contexts of arbitrary assemblies, multisets, and selections, when the components generating functions satisfy certain standard hypotheses. The proofs exploit a new criterion for Poisson--Dirichlet limits, originally designed for rapid proofs of Billingsley's theorem on the scaled sizes of log prime factors of random integers (and some new generalizations). Unexpectedly, the technique applies in the present combinatorial setting as well, giving, perhaps, a long sought-after unifying point of view. The proofs depend also on formulas of Arratia and Tavar{\'e} for the mixed moments of counts of components of various sizes, as well as formulas of Flajolet and Soria for the asymptotics of generating function coefficients.Comment: 16 page

arXiv.org e-Print Archive

On the singularity of random Bernoulli matrices - novel integer partitions and lower bound expansions

Author: Arratia Richard
DeSalvo Stephen
Publication venue
Publication date: 22/05/2012
Field of study

We prove a lower bound expansion on the probability that a random

\pm 1

matrix is singular, and conjecture that such expansions govern the actual probability of singularity. These expansions are based on naming the most likely, second most likely, and so on, ways that a Bernoulli matrix can be singular; the most likely way is to have a null vector of the form

e_i \pm e_j

, which corresponds to the integer partition 11, with two parts of size 1. The second most likely way is to have a null vector of the form

e_i \pm e_j \pm e_k \pm e_\ell

, which corresponds to the partition 1111. The fifth most likely way corresponds to the partition 21111. We define and characterize the "novel partitions" which show up in this series. As a family, novel partitions suffice to detect singularity, i.e., any singular Bernoulli matrix has a left null vector whose underlying integer partition is novel. And, with respect to this property, the family of novel partitions is minimal. We prove that the only novel partitions with six or fewer parts are 11, 1111, 21111, 111111, 221111, 311111, and 322111. We prove that there are fourteen novel partitions having seven parts. We formulate a conjecture about which partitions are "first place and runners up," in relation to the Erd\H{o}s-Littlewood-Offord bound. We prove some bounds on the interaction between left and right null vectors.Comment: v1: 26 pages. Comments v2: 28 pages; rewritten first section, corrected typos and minor error

arXiv.org e-Print Archive

A countdown process, with application to the rank of random matrices over $\mathbb F_q(n)$

Author: Arratia Richard
Earnest Michael
Publication venue
Publication date: 20/05/2016
Field of study

Motivated by the work of Fulman and Goldstein, comparing the distribution of the corank of random matrices in

\mathbb F_q[n]

with the limit distribution as

n \to \infty

, we define a countdown process, driven by independent geometric random variables related to random integer partitions. Analysis of this process leads to sharper bounds on the total variation distance

arXiv.org e-Print Archive

Poisson and independent process approximation for random combinatorial structures with a given number of components, and near-universal behavior for low rank assemblies

Author: Arratia Richard
DeSalvo Stephen
Publication venue
Publication date: 04/07/2016
Field of study

We give a general framework for approximations to combinatorial assemblies, especially suitable to the situation where the number

k

of components is specified, in addition to the overall size

n

. This involves a Poisson process, which, with the appropriate choice of parameter, may be viewed as an extension of saddlepoint approximation. We illustrate the use of this by analyzing the component structure when the rank and size are specified, and the rank,

r := n-k

, is small relative to

n

. There is near-universal behavior, in the sense that apart from cases where the exponential generating function has radius of convergence zero, for

\ell=1,2,\dots

, when

r \asymp n^\alpha

for fixed

\alpha \in (\frac{\ell}{\ell+1}, \frac{\ell+1}{\ell+2})

, the size

L_1

of the largest component converges in probabiity to

\ell+2

. Further, when

r \sim t\, n^{\ell/(\ell+1)}

for a positive integer

\ell

, and

t \in (0,\infty)

\mathbb{P}\,(L_1 \in \{\ell+1,\ell+2\}) \to 1

, with the choice governed by a Poisson limit distribution for the number of components of size

\ell+2

. This was previously observed, for the case

\ell=1

and the special cases of permutations and set partitions, using Chen-Stein approximations for the indicators of attacks and alignments, when rooks are placed randomly on a triangular board. The case

\ell=1

is especially delicate, and was not handled by previous saddlepoint approximations.Comment: 35 page

arXiv.org e-Print Archive

Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example

Author: Arratia Richard
DeSalvo Stephen
Publication venue
Publication date: 23/11/2015
Field of study

We propose a new method, probabilistic divide-and-conquer, for improving the success probability in rejection sampling. For the example of integer partitions, there is an ideal recursive scheme which improves the rejection cost from asymptotically order

n^{3/4}

to a constant. We show other examples for which a non--recursive, one--time application of probabilistic divide-and-conquer removes a substantial fraction of the rejection sampling cost. We also present a variation of probabilistic divide-and-conquer for generating i.i.d. samples that exploits features of the coupon collector's problem, in order to obtain a cost that is sublinear in the number of samples.Comment: 25 pages, revised writing. Added reference. Added Lemmas 3.9 and 3.1

arXiv.org e-Print Archive

Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent?

Author: Arratia Richard
Goldstein Larry
Publication venue
Publication date: 22/07/2010
Field of study

With

X^*

denoting a random variable with the

X

-size bias distribution, what are all distributions for

X

such that it is possible to have

X^*=X+Y

Y\geq 0

, with

X

and

Y

{\em independent}? We give the answer, due to Steutel \cite{steutel}, and also discuss the relations of size biasing to the waiting time paradox, renewal theory, sampling, tightness and uniform integrability, compound Poisson distributions, infinite divisibility, and the lognormal distributions.Comment: 30 page

arXiv.org e-Print Archive

On the Random Sampling of Pairs, with Pedestrian examples

Author: Arratia Richard
DeSalvo Stephen
Publication venue
Publication date: 01/06/2013
Field of study

Suppose one desires to randomly sample a pair of objects such as socks, hoping to get a matching pair. Even in the simplest situation for sampling, which is sampling with replacement, the innocent phrase "the distribution of the color of a matching pair" is ambiguous. One interpretation is that we condition on the event of getting a match between two random socks; this corresponds to sampling two at a time, over and over without memory, until a matching pair is found. A second interpretation is to sample sequentially, one at a time, with memory, until the same color has been seen twice. We study the difference between these two methods. The input is a discrete probability distribution on colors, describing what happens when one sock is sampled. There are two derived distributions --- the pair-color distributions under the two methods of getting a match. The output, a number we call the discrepancy of the input distribution, is the total variation distance between the two derived distributions. It is easy to determine when the two pair-color distributions come out equal, that is, to determine which distributions have discrepancy zero, but hard to determine the largest possible discrepancy. We find the exact extreme for the case of two colors, by analyzing the roots of a fifth degree polynomial in one variable. We find the exact extreme for the case of three colors, by analyzing the 49 roots of a variety spanned by two seventh-degree polynomials in two variables. We give a plausible conjecture for the general situation of a finite number of colors, and give an exact computation of a constant which is a plausible candidate for the supremum of the discrepancy over all discrete probability distributions. We briefly consider the more difficult case where the objects to be matched into pairs are of two different kinds, such as male-female or left-right.Comment: 22 pages, 5 figure

arXiv.org e-Print Archive

A Simple Direct Proof of Billingsley's Theorem

Author: Arratia Richard
Kochman Fred
Publication venue
Publication date: 07/01/2014
Field of study

Billingsley's theorem (1972) asserts that the Poisson--Dirichlet process is the limit, as

n \to \infty

, of the process giving the relative log sizes of the largest prime factor, the second largest, and so on, of a random integer chosen uniformly from 1 to

n

. In this paper we give a new proof that directly exploits Dickman's asymptotic formula for the number of such integers with no prime factor larger than

n^{1/u}

, namely

\Psi(n,n^{1/u}) \sim n \rho(u)

, to derive the limiting joint density functions of the finite-dimensional projections of the log prime factor processes. Our main technical tool is a new criterion for the convergence in distribution of non-lattice discrete random variables to continuous random variables.Comment: 13 page

arXiv.org e-Print Archive

Independent Process Approximations for Random Combinatorial Structures

Author: Arratia Richard
Tavare Simon
Publication venue
Publication date: 14/08/2013
Field of study

Many random combinatorial objects have a component structure whose joint distribution is equal to that of a process of mutually independent random variables, conditioned on the value of a weighted sum of the variables. It is interesting to compare the combinatorial structure directly to the independent discrete process, without renormalizing. The quality of approximation can often be conveniently quantified in terms of total variation distance, for functionals which observe part, but not all, of the combinatorial and independent processes. Among the examples are combinatorial assemblies (e.g. permutations, random mapping functions, and partitions of a set), multisets (e.g. polynomials over a finite field, mapping patterns and partitions of an integer), and selections (e.g. partitions of an integer into distinct parts, and square-free polynomials over finite fields). We consider issues common to all the above examples, including equalities and upper bounds for total variation distances, existence of limiting processes, heuristics for good approximations, the relation to standard generating functions, moment formulas and recursions for computing densities, refinement to the process which counts the number of parts of each possible type, the effect of further conditioning on events of moderate probability, large deviation theory and nonuniform measures on combinatorial objects, and the possibility of getting useful results by overpowering the conditioning.Comment: 71 pages, and nearly identical to the 1994 Advances in Mathematics articl

arXiv.org e-Print Archive