13 research outputs found

    Noisy population recovery in polynomial time

    Full text link
    In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution ff on binary strings of length nn from noisy samples. For some parameter Ό∈[0,1]\mu \in [0,1], a noisy sample is generated by flipping each coordinate of a sample from ff independently with probability (1−Ό)/2(1-\mu)/2. We assume an upper bound kk on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error Δ\varepsilon. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for ÎŒ>0\mu > 0, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in kk, nn and 1/Δ1/\varepsilon improving upon the previous best result of poly(klog⁥log⁥k,n,1/Δ)\mathsf{poly}(k^{\log\log k},n,1/\varepsilon) due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of M\"{o}bius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks

    Pauli Error Estimation via Population Recovery

    Get PDF
    Motivated by estimation of quantum noise models, we study the problem of learning a Pauli channel, or more generally the Pauli error rates of an arbitrary channel. By employing a novel reduction to the "Population Recovery" problem, we give an extremely simple algorithm that learns the Pauli error rates of an nn-qubit channel to precision Ï”\epsilon in ℓ∞\ell_\infty using just O(1/Ï”2)log⁥(n/Ï”)O(1/\epsilon^2) \log(n/\epsilon) applications of the channel. This is optimal up to the logarithmic factors. Our algorithm uses only unentangled state preparation and measurements, and the post-measurement classical runtime is just an O(1/Ï”)O(1/\epsilon) factor larger than the measurement data size. It is also impervious to a limited model of measurement noise where heralded measurement failures occur independently with probability ≀1/4\le 1/4. We then consider the case where the noise channel is close to the identity, meaning that the no-error outcome occurs with probability 1−η1-\eta. In the regime of small η\eta we extend our algorithm to achieve multiplicative precision 1±ϔ1 \pm \epsilon (i.e., additive precision ϔη\epsilon \eta) using just O(1Ï”2η)log⁥(n/Ï”)O\bigl(\frac{1}{\epsilon^2 \eta}\bigr) \log(n/\epsilon) applications of the channel

    A Size-Free CLT for Poisson Multinomials and its Applications

    Full text link
    An (n,k)(n,k)-Poisson Multinomial Distribution (PMD) is the distribution of the sum of nn independent random vectors supported on the set Bk={e1,
,ek}{\cal B}_k=\{e_1,\ldots,e_k\} of standard basis vectors in Rk\mathbb{R}^k. We show that any (n,k)(n,k)-PMD is poly(kσ){\rm poly}\left({k\over \sigma}\right)-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on nn from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on nn and 1/Δ1/\varepsilon of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of (n,k)(n,k)-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from Ok(1/Δ2)O_k(1/\varepsilon^2) samples in polyk(1/Δ){\rm poly}_k(1/\varepsilon)-time, removing the quasi-polynomial dependence of the running time on 1/Δ1/\varepsilon from the algorithm of Daskalakis, Kamath, and Tzamos.Comment: To appear in STOC 201

    Sur le nombre réel d'infections au COVID-19: effet de la sensibilité, de la spécificité et du nombre de tests sur la Estimation de la Prévalence

    Get PDF
    In this report, a formula for estimating the prevalence ratio of a disease in a population that is tested with imperfect tests is given. The formula is in terms of the fraction of positive test results and test parameters, i.e., probability of true positives (sensitivity) and the probability of true negatives (specificity). The motivation of this work arises in the context of the COVID-19 pandemic in which estimating the number of infected individuals depends on the sensitivity and specificity of the tests. In this context, it is shown that approximating the prevalence ratio by the ratio between the number of positive tests and the total number of tested individuals leads to dramatically high estimation errors, and thus, unadapted public health policies. The relevance of estimating the prevalence ratio using the formula presented in this work is that precision increases with the number of tests. Two conclusions are drawn from this work. First, in order to ensure that a reliable estimation is achieved with a finite number of tests, testing campaigns must be implemented with tests for which the sum of the sensitivity and the specificity is sufficiently different from one. Second, the key parameter for reducing the estimation error is the number of tests. For large number of tests, as long as the sum of the sensitivity and specificity is different from one, the exact values of these parameters have very little impact on the estimation error.Ce rapport prĂ©sente une formule mathĂ©matique pour estimer le nombre d’infections SARS-CoV-2 dans une population donnĂ©e. La formule utilise les rĂ©sultats et les paramĂštres des tests, c’est-Ă -dire la probabilitĂ© de vrais positifs (sensibilitĂ©) et de vrais nĂ©gatifs (spĂ©cificitĂ©). Selon la sensibilitĂ© et la spĂ©cificitĂ© des tests, le nombre de rĂ©sultats positifs peut ĂȘtre radicalement diffĂ©rent du nombre d’individus infectĂ©s. Ainsi, le nombre final de rĂ©sultats rendus positifs n’est pas une source d’information fiable pour la prise de dĂ©cision ou l’élaboration des directives.Deux conclusions sont tirĂ©es de ce travail; afin de garantir l’obtention d’une estimation fiable,des campagnes de tests doivent ĂȘtre mises en oeuvre avec des tests pour lesquels la somme de la sensibilitĂ© et de la spĂ©cificitĂ© est significativement diffĂ©rente de un. De plus, il est prouvĂ© qu’un nombre important de tests conduit Ă  une estimation plus prĂ©cise du nombre d’infectĂ©s. Pour un grand nombre de tests, tant que la somme de la sensibilitĂ© et de la spĂ©cificitĂ© n’est pas Ă©gale Ă  un, les valeurs exactes de ces paramĂštres ont trĂšs peu d’impact sur l’erreur d’estimation
    corecore