Search CORE

13 research outputs found

Noisy population recovery in polynomial time

Author: De Anindya
Saks Michael
Tang Sijian
Publication venue
Publication date: 24/02/2016
Field of study

In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution

f

on binary strings of length

n

from noisy samples. For some parameter

\mu \in [0,1]

, a noisy sample is generated by flipping each coordinate of a sample from

f

independently with probability

(1-\mu)/2

. We assume an upper bound

k

on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error

\varepsilon

. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for

\mu > 0

, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in

k

n

and

1/\varepsilon

improving upon the previous best result of

\mathsf{poly}(k^{\log\log k},n,1/\varepsilon)

due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of M\"{o}bius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks

arXiv.org e-Print Archive

Crossref

Recommended from our members

Discrete Fourier Analysis and Its Applications

Author: Zhang Jiapeng
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The topic of discrete Fourier analysis has been extensively studied in recent decades. It plays an important role in theoretical computer science and discrete mathematics. One hand it is interesting to study the structure of boolean functions via discrete Fourier analysis. On the other hand, these structural results also provide a huge number of applications in theoretical computer science, including computational complexity, pseudorandomness, cryptography, learning theory. In this dissertation, we extend some more connections between discrete Fourier analysis and theoretical computer science. In particular, we study the following questions.\begin{itemize}\item Robust sensitivity of boolean function. In this part, we study the connection between the Fourier tail bound and the sensitivity tail bound of boolean functions, which is an analogue of the sensitivity conjecture, which was proposed by Nisan \cite{nisan1991crew}.\item DNF sparsification. The disjunctive normal form (or DNF) is a widely used representation of boolean functions. It is very interesting to study the structure of DNFs. There are two natural ways to measure the complexity of DNFs, the width and the size. In this thesis, we study a connection between these two measures. We propose a new approach by combing the swithing lemma (a combinatoric tool) and the hypercontrativity inequality (an analytic inequality). This framework does also suggest a new approach to the famous sunflower conjecture.\item Applications in learning theory. In 1989, the first Fourier-based learning algorithms was introduced by a seminar paper of Linial, Mansour and Nisan \cite{linial1989constant}. Followed by a series of subsequent works, people found that discrete Fourier analysis is powerful to design learning algorithms. One hand sparse Fourier functions are strong enough to approximate a lot of functions, on the other hand sparse Fourier functions are relatively easy to learn. Build on this framework, we give a more efficient algorithm to solve the \emph{population recovery} problem. That is how to recover a unknown distribution from noisy samples.\end{itemize

eScholarship - University of California

Pauli Error Estimation via Population Recovery

Author: Flammia Steven T.
O\u27Donnell Ryan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2021)
Publication date: 01/01/2021
Field of study

Motivated by estimation of quantum noise models, we study the problem of learning a Pauli channel, or more generally the Pauli error rates of an arbitrary channel. By employing a novel reduction to the "Population Recovery" problem, we give an extremely simple algorithm that learns the Pauli error rates of an

n

-qubit channel to precision

\epsilon

\ell_\infty

using just

O(1/\epsilon^2) \log(n/\epsilon)

applications of the channel. This is optimal up to the logarithmic factors. Our algorithm uses only unentangled state preparation and measurements, and the post-measurement classical runtime is just an

O(1/\epsilon)

factor larger than the measurement data size. It is also impervious to a limited model of measurement noise where heralded measurement failures occur independently with probability

\le 1/4

. We then consider the case where the noise channel is close to the identity, meaning that the no-error outcome occurs with probability

1-\eta

. In the regime of small

\eta

we extend our algorithm to achieve multiplicative precision

1 \pm \epsilon

(i.e., additive precision

\epsilon \eta

) using just

O\bigl(\frac{1}{\epsilon^2 \eta}\bigr) \log(n/\epsilon)

applications of the channel

arXiv.org e-Print Archive

Directory of Open Access Journals

Dagstuhl Research Online Publication Server

Population recovery and partial identification

Author: A Blum
A Kalai
Amir Yehudayoff
Avi Wigderson
CK Liew
E Kushilevitz
FA Matsen
J Feldman
J Traub
L Beck
R Agrawal
S Floyd
SL Warner
W Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Size-Free CLT for Poisson Multinomials and its Applications

Author: Daskalakis Konstantinos
De Anindya
Kamath Gautam
Kamath Gautam Chetan
Tzamos Christos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2016
Field of study

(n,k)

-Poisson Multinomial Distribution (PMD) is the distribution of the sum of

n

independent random vectors supported on the set

{\cal B}_k=\{e_1,\ldots,e_k\}

of standard basis vectors in

\mathbb{R}^k

. We show that any

(n,k)

-PMD is

{\rm poly}\left({k\over \sigma}\right)

-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on

n

from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on

n

and

1/\varepsilon

of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of

(n,k)

-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from

O_k(1/\varepsilon^2)

samples in

{\rm poly}_k(1/\varepsilon)

-time, removing the quasi-polynomial dependence of the running time on

1/\varepsilon

from the algorithm of Daskalakis, Kamath, and Tzamos.Comment: To appear in STOC 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Sur le nombre réel d'infections au COVID-19: effet de la sensibilité, de la spécificité et du nombre de tests sur la Estimation de la Prévalence

Author: Altman Eitan
Mounir Izza
Najid Fatim-Zahra
Perlaza Samir
Publication venue: HAL CCSD
Publication date: 27/05/2020
Field of study

In this report, a formula for estimating the prevalence ratio of a disease in a population that is tested with imperfect tests is given. The formula is in terms of the fraction of positive test results and test parameters, i.e., probability of true positives (sensitivity) and the probability of true negatives (specificity). The motivation of this work arises in the context of the COVID-19 pandemic in which estimating the number of infected individuals depends on the sensitivity and specificity of the tests. In this context, it is shown that approximating the prevalence ratio by the ratio between the number of positive tests and the total number of tested individuals leads to dramatically high estimation errors, and thus, unadapted public health policies. The relevance of estimating the prevalence ratio using the formula presented in this work is that precision increases with the number of tests. Two conclusions are drawn from this work. First, in order to ensure that a reliable estimation is achieved with a finite number of tests, testing campaigns must be implemented with tests for which the sum of the sensitivity and the specificity is sufficiently different from one. Second, the key parameter for reducing the estimation error is the number of tests. For large number of tests, as long as the sum of the sensitivity and specificity is different from one, the exact values of these parameters have very little impact on the estimation error.Ce rapport présente une formule mathématique pour estimer le nombre d’infections SARS-CoV-2 dans une population donnée. La formule utilise les résultats et les paramètres des tests, c’est-à-dire la probabilité de vrais positifs (sensibilité) et de vrais négatifs (spécificité). Selon la sensibilité et la spécificité des tests, le nombre de résultats positifs peut être radicalement différent du nombre d’individus infectés. Ainsi, le nombre final de résultats rendus positifs n’est pas une source d’information fiable pour la prise de décision ou l’élaboration des directives.Deux conclusions sont tirées de ce travail; afin de garantir l’obtention d’une estimation fiable,des campagnes de tests doivent être mises en oeuvre avec des tests pour lesquels la somme de la sensibilité et de la spécificité est significativement différente de un. De plus, il est prouvé qu’un nombre important de tests conduit à une estimation plus précise du nombre d’infectés. Pour un grand nombre de tests, tant que la somme de la sensibilité et de la spécificité n’est pas égale à un, les valeurs exactes de ces paramètres ont très peu d’impact sur l’erreur d’estimation

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot