35 research outputs found

    Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

    Full text link
    We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph G=(V,E)G=(V, E), we show how to approximately sample uniformly random spanning trees from GG in O~(V)\widetilde{O}(\lvert V\rvert) time per sample after an initial O~(E)\widetilde{O}(\lvert E\rvert) time preprocessing. For a determinantal point process on subsets of size kk of a ground set of nn elements, we show how to approximately sample in O~(kω)\widetilde{O}(k^\omega) time after an initial O~(nkω1)\widetilde{O}(nk^{\omega-1}) time preprocessing, where ω<2.372864\omega<2.372864 is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of O~(min{nk2,nω})\widetilde{O}(\min\{nk^2, n^\omega\}) to O~(nkω1)\widetilde{O}(nk^{\omega-1}). In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution μ\mu on ([n]k)\binom{[n]}{k} is reduced to sampling from related distributions on ([t]k)\binom{[t]}{k} for tnt\ll n. We show that for strongly Rayleigh distributions, we can can achieve the optimal t=O~(k)t=\widetilde{O}(k). Our reduction involves sampling from O~(1)\widetilde{O}(1) domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of μ\mu. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures

    Quadratic Speedups in Parallel Sampling from Determinantal Distributions

    Full text link
    We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size kk of a ground set, we show how to approximately produce a sample in O~(k12+c)\widetilde{O}(k^{\frac{1}{2} + c}) time with polynomially many processors for any constant c>0c>0. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to O~(k)\widetilde{O}(\sqrt k) and we show how to sample exactly in these cases. As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only O(1)O(1) steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling, Ω~(k12c)\widetilde{\Omega}(k^{\frac{1}{2}-c}) steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work.Comment: 33 pages, SPAA 202

    Domain Sparsification of Discrete Distributions Using Entropic Independence

    Get PDF
    We present a framework for speeding up the time it takes to sample from discrete distributions ? defined over subsets of size k of a ground set of n elements, in the regime where k is much smaller than n. We show that if one has access to estimates of marginals P_{S? ?} {i ? S}, then the task of sampling from ? can be reduced to sampling from related distributions ? supported on size k subsets of a ground set of only n^{1-?}? poly(k) elements. Here, 1/? ? [1, k] is the parameter of entropic independence for ?. Further, our algorithm only requires sparsified distributions ? that are obtained by applying a sparse (mostly 0) external field to ?, an operation that for many distributions ? of interest, retains algorithmic tractability of sampling from ?. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of ?, and in return reduce the amortized cost needed to produce many samples from the distribution ?, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where ? = ?(1), our result reduces the domain size, and as a corollary, the cost-per-sample, by a poly(n) factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi?ski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to ? = 1). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over O(1/k) relative error established in prior work

    Dimension reduction for maximum matchings and the Fastest Mixing Markov Chain

    Get PDF
    Let G=(V,E)G = (V,E) be an undirected graph with maximum degree Δ\Delta and vertex conductance Ψ(G)\Psi^*(G). We show that there exists a symmetric, stochastic matrix PP, with off-diagonal entries supported on EE, whose spectral gap γ(P)\gamma^*(P) satisfies Ψ(G)2/logΔγ(P)Ψ(G).\Psi^*(G)^{2}/\log\Delta \lesssim \gamma^*(P) \lesssim \Psi^*(G). Our bound is optimal under the Small Set Expansion Hypothesis, and answers a question of Olesker-Taylor and Zanetti, who obtained such a result with logΔ\log\Delta replaced by logV\log|V|. In order to obtain our result, we show how to embed a negative-type semi-metric dd defined on VV into a negative-type semi-metric dd' supported in RO(logΔ)\mathbb{R}^{O(\log\Delta)}, such that the (fractional) matching number of the weighted graph (V,E,d)(V,E,d) is approximately equal to that of (V,E,d)(V,E,d').Comment: 6 page

    Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

    Full text link
    We study Glauber dynamics for sampling from discrete distributions μ\mu on the hypercube {±1}n\{\pm 1\}^n. Recently, techniques based on spectral independence have successfully yielded optimal O(n)O(n) relaxation times for a host of different distributions μ\mu. We show that spectral independence is universal: a relaxation time of O(n)O(n) implies spectral independence. We then study a notion of tractability for μ\mu, defined in terms of smoothness of the multilinear extension of its Hamiltonian -- logμ\log \mu -- over [1,+1]n[-1,+1]^n. We show that Glauber dynamics has relaxation time O(n)O(n) for such μ\mu, and using the universality of spectral independence, we conclude that these distributions are also fractionally log-concave and consequently satisfy modified log-Sobolev inequalities. We sharpen our estimates and obtain approximate tensorization of entropy and the optimal O~(n)\widetilde{O}(n) mixing time for random Hamiltonians, i.e. the classically studied mixed pp-spin model at sufficiently high temperature. These results have significant downstream consequences for concentration of measure, statistical testing, and learning

    A Big Data Smart Agricultural System: Recommending Optimum Fertilisers For Crops

    Get PDF
    Nutrients are important to promote plant growth and nutrient deficiency is the primary factor limiting crop production. However, excess fertilisers can also have a negative impact on crop quality and yield, cause an increase in pollution and decrease producer profit. Hence, determining the suitable quantities of fertiliser for every crop is very useful. Currently, the agricultural systems with internet of things make very large data volumes. Exploiting agricultural Big Data will help to extract valuable information. However, designing and implementing a large scale agricultural data warehouse are very challenging. The data warehouse is a key module to build a smart crop system to make proficient agronomy recommendations. In our paper, an electronic agricultural record (EAR) is proposed to integrate many separate datasets into a unified dataset. Then, to store and manage the agricultural Big Data, we built an agricultural data warehouse based on Hive and Elasticsearch. Finally, we applied some statistical methods based on our data warehouse to extract fertiliser information such as a case study. These statistical methods propose the recommended quantities of fertiliser components across a wide range of environmental and crop management conditions, such as nitrogen (N), phosphorus (P) and potassium (K) for the top ten most popular crops in EU
    corecore