Search CORE

35 research outputs found

Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

Author: Anari Nima
Liu Yang P.
Vuong Thuy-Duong
Publication venue
Publication date: 06/04/2022
Field of study

We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph

G=(V, E)

, we show how to approximately sample uniformly random spanning trees from

G

\widetilde{O}(\lvert V\rvert)

time per sample after an initial

\widetilde{O}(\lvert E\rvert)

time preprocessing. For a determinantal point process on subsets of size

k

of a ground set of

n

elements, we show how to approximately sample in

\widetilde{O}(k^\omega)

time after an initial

\widetilde{O}(nk^{\omega-1})

time preprocessing, where

\omega<2.372864

is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of

\widetilde{O}(\min\{nk^2, n^\omega\})

\widetilde{O}(nk^{\omega-1})

. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution

\mu

\binom{[n]}{k}

is reduced to sampling from related distributions on

\binom{[t]}{k}

for

t\ll n

. We show that for strongly Rayleigh distributions, we can can achieve the optimal

t=\widetilde{O}(k)

. Our reduction involves sampling from

\widetilde{O}(1)

domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of

\mu

. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures

arXiv.org e-Print Archive

Quadratic Speedups in Parallel Sampling from Determinantal Distributions

Author: Anari Nima
Burgess Callum
Tian Kevin
Vuong Thuy-Duong
Publication venue
Publication date: 28/04/2023
Field of study

We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size

k

of a ground set, we show how to approximately produce a sample in

\widetilde{O}(k^{\frac{1}{2} + c})

time with polynomially many processors for any constant

c>0

. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to

\widetilde{O}(\sqrt k)

and we show how to sample exactly in these cases. As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only

O(1)

steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling,

\widetilde{\Omega}(k^{\frac{1}{2}-c})

steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work.Comment: 33 pages, SPAA 202

arXiv.org e-Print Archive

Domain Sparsification of Discrete Distributions Using Entropic Independence

Author: Anari Nima
Derezi?ski Micha?
Vuong Thuy-Duong
Yang Elizabeth
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

We present a framework for speeding up the time it takes to sample from discrete distributions ? defined over subsets of size k of a ground set of n elements, in the regime where k is much smaller than n. We show that if one has access to estimates of marginals P_{S? ?} {i ? S}, then the task of sampling from ? can be reduced to sampling from related distributions ? supported on size k subsets of a ground set of only n^{1-?}? poly(k) elements. Here, 1/? ? [1, k] is the parameter of entropic independence for ?. Further, our algorithm only requires sparsified distributions ? that are obtained by applying a sparse (mostly 0) external field to ?, an operation that for many distributions ? of interest, retains algorithmic tractability of sampling from ?. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of ?, and in return reduce the amortized cost needed to produce many samples from the distribution ?, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where ? = ?(1), our result reduces the domain size, and as a corollary, the cost-per-sample, by a poly(n) factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi?ski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to ? = 1). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over O(1/k) relative error established in prior work

Dagstuhl Research Online Publication Server

Dimension reduction for maximum matchings and the Fastest Mixing Markov Chain

Author: Huy Pham
Thuy-Duong Vuong
Vishesh Jain
Publication venue
Publication date: 23/03/2022
Field of study

Let

G = (V,E)

be an undirected graph with maximum degree

\Delta

and vertex conductance

\Psi^*(G)

. We show that there exists a symmetric, stochastic matrix

P

, with off-diagonal entries supported on

E

, whose spectral gap

\gamma^*(P)

satisfies

\Psi^*(G)^{2}/\log\Delta \lesssim \gamma^*(P) \lesssim \Psi^*(G).

Our bound is optimal under the Small Set Expansion Hypothesis, and answers a question of Olesker-Taylor and Zanetti, who obtained such a result with

\log\Delta

replaced by

\log|V|

. In order to obtain our result, we show how to embed a negative-type semi-metric

d

defined on

V

into a negative-type semi-metric

d'

supported in

\mathbb{R}^{O(\log\Delta)}

, such that the (fractional) matching number of the weighted graph

(V,E,d)

is approximately equal to that of

(V,E,d')

.Comment: 6 page

arXiv.org e-Print Archive

Comptes Rendus Mathématique

Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

Author: Anari Nima
Jain Vishesh
Koehler Frederic
Pham Huy Tuan
Vuong Thuy-Duong
Publication venue
Publication date: 19/07/2023
Field of study

We study Glauber dynamics for sampling from discrete distributions

\mu

on the hypercube

\{\pm 1\}^n

. Recently, techniques based on spectral independence have successfully yielded optimal

O(n)

relaxation times for a host of different distributions

\mu

. We show that spectral independence is universal: a relaxation time of

O(n)

implies spectral independence. We then study a notion of tractability for

\mu

, defined in terms of smoothness of the multilinear extension of its Hamiltonian --

\log \mu

-- over

[-1,+1]^n

. We show that Glauber dynamics has relaxation time

O(n)

for such

\mu

, and using the universality of spectral independence, we conclude that these distributions are also fractionally log-concave and consequently satisfy modified log-Sobolev inequalities. We sharpen our estimates and obtain approximate tensorization of entropy and the optimal

\widetilde{O}(n)

mixing time for random Hamiltonians, i.e. the classically studied mixed

p

-spin model at sufficiently high temperature. These results have significant downstream consequences for concentration of measure, statistical testing, and learning

arXiv.org e-Print Archive

A Big Data Smart Agricultural System: Recommending Optimum Fertilisers For Crops

Author: Conlan Owen
Dang Cach N.
Duong Thuy-Van T.
Ngo Vuong
Nguyen Nguyen
Publication venue: Technological University Dublin
Publication date: 01/01/2023
Field of study

Nutrients are important to promote plant growth and nutrient deficiency is the primary factor limiting crop production. However, excess fertilisers can also have a negative impact on crop quality and yield, cause an increase in pollution and decrease producer profit. Hence, determining the suitable quantities of fertiliser for every crop is very useful. Currently, the agricultural systems with internet of things make very large data volumes. Exploiting agricultural Big Data will help to extract valuable information. However, designing and implementing a large scale agricultural data warehouse are very challenging. The data warehouse is a key module to build a smart crop system to make proficient agronomy recommendations. In our paper, an electronic agricultural record (EAR) is proposed to integrate many separate datasets into a unified dataset. Then, to store and manage the agricultural Big Data, we built an agricultural data warehouse based on Hive and Elasticsearch. Finally, we applied some statistical methods based on our data warehouse to extract fertiliser information such as a case study. These statistical methods propose the recommended quantities of fertiliser components across a wide range of environmental and crop management conditions, such as nitrogen (N), phosphorus (P) and potassium (K) for the top ten most popular crops in EU

Arrow@TUDublin