35 research outputs found
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
We design fast algorithms for repeatedly sampling from strongly Rayleigh
distributions, which include random spanning tree distributions and
determinantal point processes. For a graph , we show how to
approximately sample uniformly random spanning trees from in
time per sample after an initial
time preprocessing. For a determinantal point
process on subsets of size of a ground set of elements, we show how to
approximately sample in time after an initial
time preprocessing, where is
the matrix multiplication exponent. We even improve the state of the art for
obtaining a single sample from determinantal point processes, from the prior
runtime of to
.
In our main technical result, we achieve the optimal limit on domain
sparsification for strongly Rayleigh distributions. In domain sparsification,
sampling from a distribution on is reduced to sampling
from related distributions on for . We show that for
strongly Rayleigh distributions, we can can achieve the optimal
. Our reduction involves sampling from
domain-sparsified distributions, all of which can be produced efficiently
assuming convenient access to approximate overestimates for marginals of .
Having access to marginals is analogous to having access to the mean and
covariance of a continuous distribution, or knowing "isotropy" for the
distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS)
conjecture and optimal samplers based on it. We view our result as a moral
analog of the KLS conjecture and its consequences for sampling, for discrete
strongly Rayleigh measures
Quadratic Speedups in Parallel Sampling from Determinantal Distributions
We study the problem of parallelizing sampling from distributions related to
determinants: symmetric, nonsymmetric, and partition-constrained determinantal
point processes, as well as planar perfect matchings. For these distributions,
the partition function, a.k.a. the count, can be obtained via matrix
determinants, a highly parallelizable computation; Csanky proved it is in NC.
However, parallel counting does not automatically translate to parallel
sampling, as classic reductions between the two are inherently sequential. We
show that a nearly quadratic parallel speedup over sequential sampling can be
achieved for all the aforementioned distributions. If the distribution is
supported on subsets of size of a ground set, we show how to approximately
produce a sample in time with polynomially
many processors for any constant . In the two special cases of symmetric
determinantal point processes and planar perfect matchings, our bound improves
to and we show how to sample exactly in these cases.
As our main technical contribution, we fully characterize the limits of
batching for the steps of sampling-to-counting reductions. We observe that only
steps can be batched together if we strive for exact sampling, even in
the case of nonsymmetric determinantal point processes. However, we show that
for approximate sampling, steps can be
batched together, for any entropically independent distribution, which includes
all mentioned classes of determinantal point processes. Entropic independence
and related notions have been the source of breakthroughs in Markov chain
analysis in recent years, so we expect our framework to prove useful for
distributions beyond those studied in this work.Comment: 33 pages, SPAA 202
Domain Sparsification of Discrete Distributions Using Entropic Independence
We present a framework for speeding up the time it takes to sample from discrete distributions ? defined over subsets of size k of a ground set of n elements, in the regime where k is much smaller than n. We show that if one has access to estimates of marginals P_{S? ?} {i ? S}, then the task of sampling from ? can be reduced to sampling from related distributions ? supported on size k subsets of a ground set of only n^{1-?}? poly(k) elements. Here, 1/? ? [1, k] is the parameter of entropic independence for ?. Further, our algorithm only requires sparsified distributions ? that are obtained by applying a sparse (mostly 0) external field to ?, an operation that for many distributions ? of interest, retains algorithmic tractability of sampling from ?. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of ?, and in return reduce the amortized cost needed to produce many samples from the distribution ?, as is often needed in upstream tasks such as counting and inference.
For a wide range of distributions where ? = ?(1), our result reduces the domain size, and as a corollary, the cost-per-sample, by a poly(n) factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi?ski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to ? = 1). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over O(1/k) relative error established in prior work
Dimension reduction for maximum matchings and the Fastest Mixing Markov Chain
Let be an undirected graph with maximum degree and
vertex conductance . We show that there exists a symmetric,
stochastic matrix , with off-diagonal entries supported on , whose
spectral gap satisfies Our bound is optimal under the Small Set
Expansion Hypothesis, and answers a question of Olesker-Taylor and Zanetti, who
obtained such a result with replaced by .
In order to obtain our result, we show how to embed a negative-type
semi-metric defined on into a negative-type semi-metric supported
in , such that the (fractional) matching number of
the weighted graph is approximately equal to that of .Comment: 6 page
Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses
We study Glauber dynamics for sampling from discrete distributions on
the hypercube . Recently, techniques based on spectral
independence have successfully yielded optimal relaxation times for a
host of different distributions . We show that spectral independence is
universal: a relaxation time of implies spectral independence.
We then study a notion of tractability for , defined in terms of
smoothness of the multilinear extension of its Hamiltonian -- --
over . We show that Glauber dynamics has relaxation time for
such , and using the universality of spectral independence, we conclude
that these distributions are also fractionally log-concave and consequently
satisfy modified log-Sobolev inequalities. We sharpen our estimates and obtain
approximate tensorization of entropy and the optimal mixing
time for random Hamiltonians, i.e. the classically studied mixed -spin model
at sufficiently high temperature. These results have significant downstream
consequences for concentration of measure, statistical testing, and learning
A Big Data Smart Agricultural System: Recommending Optimum Fertilisers For Crops
Nutrients are important to promote plant growth and nutrient deficiency is the primary factor limiting crop production. However, excess fertilisers can also have a negative impact on crop quality and yield, cause an increase in pollution and decrease producer profit. Hence, determining the suitable quantities of fertiliser for every crop is very useful. Currently, the agricultural systems with internet of things make very large data volumes. Exploiting agricultural Big Data will help to extract valuable information. However, designing and implementing a large scale agricultural data warehouse are very challenging. The data warehouse is a key module to build a smart crop system to make proficient agronomy recommendations. In our paper, an electronic agricultural record (EAR) is proposed to integrate many separate datasets into a unified dataset. Then, to store and manage the agricultural Big Data, we built an agricultural data warehouse based on Hive and Elasticsearch. Finally, we applied some statistical methods based on our data warehouse to extract fertiliser information such as a case study. These statistical methods propose the recommended quantities of fertiliser components across a wide range of environmental and crop management conditions, such as nitrogen (N), phosphorus (P) and potassium (K) for the top ten most popular crops in EU