287 research outputs found
On the Amount of Dependence in the Prime Factorization of a Uniform Random Integer
How much dependence is there in the prime factorization of a random integer
distributed uniformly from 1 to n? How much dependence is there in the
decomposition into cycles of a random permutation of n points? What is the
relation between the Poisson-Dirichlet process and the scale invariant Poisson
process? These three questions have essentially the same answers, with respect
to total variation distance, considering only small components, and with
respect to a Wasserstein distance, considering all components. The Wasserstein
distance is the expected number of changes -- insertions and deletions --
needed to change the dependent system into an independent system.
In particular we show that for primes, roughly speaking, 2+o(1) changes are
necessary and sufficient to convert a uniformly distributed random integer from
1 to n into a random integer prod_{p leq n} p^{Z_p} in which the multiplicity
Z_p of the factor p is geometrically distributed, with all Z_p independent. The
changes are, with probability tending to 1, one deletion, together with a
random number of insertions, having expectation 1+o(1).
The crucial tool for showing that 2+epsilon suffices is a coupling of the
infinite independent model of prime multiplicities, with the scale invariant
Poisson process on (0,infty). A corollary of this construction is the first
metric bound on the distance to the Poisson-Dirichlet in Billingsley's 1972
weak convergence result. Our bound takes the form: there are couplings in which
E sum |log P_i(n) - (log n) V_i | = O(\log \log n), where P_i denotes the
i-th largest prime factor and V_i denotes the i-th component of the
Poisson-Dirichlet process. It is reasonable to conjecture that O(1) is
achievable.Comment: 46 pages, appeared in Contemporary Combinatorics, 29-91, Bolyai Soc.
Math. Stud., 10, Janos Bolyai Math. Soc., Budapest, 200
Poisson--Dirichlet Limit Theorems in Combinatorial Applications via Multi-Intensities
We present new, exceptionally efficient proofs of Poisson--Dirichlet limit
theorems for the scaled sizes of irreducible components of random elements in
the classic combinatorial contexts of arbitrary assemblies, multisets, and
selections, when the components generating functions satisfy certain standard
hypotheses. The proofs exploit a new criterion for Poisson--Dirichlet limits,
originally designed for rapid proofs of Billingsley's theorem on the scaled
sizes of log prime factors of random integers (and some new generalizations).
Unexpectedly, the technique applies in the present combinatorial setting as
well, giving, perhaps, a long sought-after unifying point of view. The proofs
depend also on formulas of Arratia and Tavar{\'e} for the mixed moments of
counts of components of various sizes, as well as formulas of Flajolet and
Soria for the asymptotics of generating function coefficients.Comment: 16 page
On the singularity of random Bernoulli matrices - novel integer partitions and lower bound expansions
We prove a lower bound expansion on the probability that a random
matrix is singular, and conjecture that such expansions govern the actual
probability of singularity. These expansions are based on naming the most
likely, second most likely, and so on, ways that a Bernoulli matrix can be
singular; the most likely way is to have a null vector of the form , which corresponds to the integer partition 11, with two parts of size 1.
The second most likely way is to have a null vector of the form , which corresponds to the partition 1111. The fifth most
likely way corresponds to the partition 21111.
We define and characterize the "novel partitions" which show up in this
series. As a family, novel partitions suffice to detect singularity, i.e., any
singular Bernoulli matrix has a left null vector whose underlying integer
partition is novel. And, with respect to this property, the family of novel
partitions is minimal.
We prove that the only novel partitions with six or fewer parts are 11, 1111,
21111, 111111, 221111, 311111, and 322111. We prove that there are fourteen
novel partitions having seven parts.
We formulate a conjecture about which partitions are "first place and runners
up," in relation to the Erd\H{o}s-Littlewood-Offord bound.
We prove some bounds on the interaction between left and right null vectors.Comment: v1: 26 pages. Comments v2: 28 pages; rewritten first section,
corrected typos and minor error
A countdown process, with application to the rank of random matrices over
Motivated by the work of Fulman and Goldstein, comparing the distribution of
the corank of random matrices in with the limit distribution
as , we define a countdown process, driven by independent
geometric random variables related to random integer partitions. Analysis of
this process leads to sharper bounds on the total variation distance
Poisson and independent process approximation for random combinatorial structures with a given number of components, and near-universal behavior for low rank assemblies
We give a general framework for approximations to combinatorial assemblies,
especially suitable to the situation where the number of components is
specified, in addition to the overall size . This involves a Poisson
process, which, with the appropriate choice of parameter, may be viewed as an
extension of saddlepoint approximation.
We illustrate the use of this by analyzing the component structure when the
rank and size are specified, and the rank, , is small relative to
. There is near-universal behavior, in the sense that apart from cases where
the exponential generating function has radius of convergence zero, for
, when for fixed , the size of the largest
component converges in probabiity to . Further, when for a positive integer , and ,
, with the choice governed by a
Poisson limit distribution for the number of components of size . This
was previously observed, for the case and the special cases of
permutations and set partitions, using Chen-Stein approximations for the
indicators of attacks and alignments, when rooks are placed randomly on a
triangular board. The case is especially delicate, and was not handled
by previous saddlepoint approximations.Comment: 35 page
Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example
We propose a new method, probabilistic divide-and-conquer, for improving the
success probability in rejection sampling. For the example of integer
partitions, there is an ideal recursive scheme which improves the rejection
cost from asymptotically order to a constant. We show other examples
for which a non--recursive, one--time application of probabilistic
divide-and-conquer removes a substantial fraction of the rejection sampling
cost.
We also present a variation of probabilistic divide-and-conquer for
generating i.i.d. samples that exploits features of the coupon collector's
problem, in order to obtain a cost that is sublinear in the number of samples.Comment: 25 pages, revised writing. Added reference. Added Lemmas 3.9 and 3.1
Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent?
With denoting a random variable with the -size bias distribution,
what are all distributions for such that it is possible to have ,
, with and {\em independent}? We give the answer, due to
Steutel \cite{steutel}, and also discuss the relations of size biasing to the
waiting time paradox, renewal theory, sampling, tightness and uniform
integrability, compound Poisson distributions, infinite divisibility, and the
lognormal distributions.Comment: 30 page
On the Random Sampling of Pairs, with Pedestrian examples
Suppose one desires to randomly sample a pair of objects such as socks,
hoping to get a matching pair. Even in the simplest situation for sampling,
which is sampling with replacement, the innocent phrase "the distribution of
the color of a matching pair" is ambiguous. One interpretation is that we
condition on the event of getting a match between two random socks; this
corresponds to sampling two at a time, over and over without memory, until a
matching pair is found. A second interpretation is to sample sequentially, one
at a time, with memory, until the same color has been seen twice.
We study the difference between these two methods. The input is a discrete
probability distribution on colors, describing what happens when one sock is
sampled. There are two derived distributions --- the pair-color distributions
under the two methods of getting a match. The output, a number we call the
discrepancy of the input distribution, is the total variation distance between
the two derived distributions.
It is easy to determine when the two pair-color distributions come out equal,
that is, to determine which distributions have discrepancy zero, but hard to
determine the largest possible discrepancy. We find the exact extreme for the
case of two colors, by analyzing the roots of a fifth degree polynomial in one
variable. We find the exact extreme for the case of three colors, by analyzing
the 49 roots of a variety spanned by two seventh-degree polynomials in two
variables. We give a plausible conjecture for the general situation of a finite
number of colors, and give an exact computation of a constant which is a
plausible candidate for the supremum of the discrepancy over all discrete
probability distributions.
We briefly consider the more difficult case where the objects to be matched
into pairs are of two different kinds, such as male-female or left-right.Comment: 22 pages, 5 figure
A Simple Direct Proof of Billingsley's Theorem
Billingsley's theorem (1972) asserts that the Poisson--Dirichlet process is
the limit, as , of the process giving the relative log sizes of
the largest prime factor, the second largest, and so on, of a random integer
chosen uniformly from 1 to . In this paper we give a new proof that directly
exploits Dickman's asymptotic formula for the number of such integers with no
prime factor larger than , namely , to
derive the limiting joint density functions of the finite-dimensional
projections of the log prime factor processes. Our main technical tool is a new
criterion for the convergence in distribution of non-lattice discrete random
variables to continuous random variables.Comment: 13 page
Independent Process Approximations for Random Combinatorial Structures
Many random combinatorial objects have a component structure whose joint
distribution is equal to that of a process of mutually independent random
variables, conditioned on the value of a weighted sum of the variables. It is
interesting to compare the combinatorial structure directly to the independent
discrete process, without renormalizing. The quality of approximation can often
be conveniently quantified in terms of total variation distance, for
functionals which observe part, but not all, of the combinatorial and
independent processes.
Among the examples are combinatorial assemblies (e.g. permutations, random
mapping functions, and partitions of a set), multisets (e.g. polynomials over a
finite field, mapping patterns and partitions of an integer), and selections
(e.g. partitions of an integer into distinct parts, and square-free polynomials
over finite fields).
We consider issues common to all the above examples, including equalities and
upper bounds for total variation distances, existence of limiting processes,
heuristics for good approximations, the relation to standard generating
functions, moment formulas and recursions for computing densities, refinement
to the process which counts the number of parts of each possible type, the
effect of further conditioning on events of moderate probability, large
deviation theory and nonuniform measures on combinatorial objects, and the
possibility of getting useful results by overpowering the conditioning.Comment: 71 pages, and nearly identical to the 1994 Advances in Mathematics
articl
- β¦