13 research outputs found
Noisy population recovery in polynomial time
In the noisy population recovery problem of Dvir et al., the goal is to learn
an unknown distribution on binary strings of length from noisy samples.
For some parameter , a noisy sample is generated by flipping
each coordinate of a sample from independently with probability
. We assume an upper bound on the size of the support of the
distribution, and the goal is to estimate the probability of any string to
within some given error . It is known that the algorithmic
complexity and sample complexity of this problem are polynomially related to
each other.
We show that for , the sample complexity (and hence the algorithmic
complexity) is bounded by a polynomial in , and
improving upon the previous best result of due to Lovett and Zhang.
Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated}
version of M\"{o}bius inversion. In turn, the latter crucially uses the
construction of \emph{robust local inverse} due to Moitra and Saks
Recommended from our members
Discrete Fourier Analysis and Its Applications
The topic of discrete Fourier analysis has been extensively studied in recent decades. It plays an important role in theoretical computer science and discrete mathematics. One hand it is interesting to study the structure of boolean functions via discrete Fourier analysis. On the other hand, these structural results also provide a huge number of applications in theoretical computer science, including computational complexity, pseudorandomness, cryptography, learning theory. In this dissertation, we extend some more connections between discrete Fourier analysis and theoretical computer science. In particular, we study the following questions.\begin{itemize}\item Robust sensitivity of boolean function. In this part, we study the connection between the Fourier tail bound and the sensitivity tail bound of boolean functions, which is an analogue of the sensitivity conjecture, which was proposed by Nisan \cite{nisan1991crew}.\item DNF sparsification. The disjunctive normal form (or DNF) is a widely used representation of boolean functions. It is very interesting to study the structure of DNFs. There are two natural ways to measure the complexity of DNFs, the width and the size. In this thesis, we study a connection between these two measures. We propose a new approach by combing the swithing lemma (a combinatoric tool) and the hypercontrativity inequality (an analytic inequality). This framework does also suggest a new approach to the famous sunflower conjecture.\item Applications in learning theory. In 1989, the first Fourier-based learning algorithms was introduced by a seminar paper of Linial, Mansour and Nisan \cite{linial1989constant}. Followed by a series of subsequent works, people found that discrete Fourier analysis is powerful to design learning algorithms. One hand sparse Fourier functions are strong enough to approximate a lot of functions, on the other hand sparse Fourier functions are relatively easy to learn. Build on this framework, we give a more efficient algorithm to solve the \emph{population recovery} problem. That is how to recover a unknown distribution from noisy samples.\end{itemize
Pauli Error Estimation via Population Recovery
Motivated by estimation of quantum noise models, we study the problem of learning a Pauli channel, or more generally the Pauli error rates of an arbitrary channel. By employing a novel reduction to the "Population Recovery" problem, we give an extremely simple algorithm that learns the Pauli error rates of an -qubit channel to precision in using just applications of the channel. This is optimal up to the logarithmic factors. Our algorithm uses only unentangled state preparation and measurements, and the post-measurement classical runtime is just an factor larger than the measurement data size. It is also impervious to a limited model of measurement noise where heralded measurement failures occur independently with probability .
We then consider the case where the noise channel is close to the identity, meaning that the no-error outcome occurs with probability . In the regime of small we extend our algorithm to achieve multiplicative precision (i.e., additive precision ) using just applications of the channel
A Size-Free CLT for Poisson Multinomials and its Applications
An -Poisson Multinomial Distribution (PMD) is the distribution of the
sum of independent random vectors supported on the set of standard basis vectors in . We show
that any -PMD is -close in total
variation distance to the (appropriately discretized) multi-dimensional
Gaussian with the same first two moments, removing the dependence on from
the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is
obtained by bootstrapping the Valiant-Valiant CLT itself through the structural
characterization of PMDs shown in recent work by Daskalakis, Kamath, and
Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS
for approximate Nash equilibria in anonymous games, significantly improving the
state of the art, and matching qualitatively the running time dependence on
and of the best known algorithm for two-strategy anonymous
games. Our new CLT also enables the construction of covers for the set of
-PMDs, which are proper and whose size is shown to be essentially
optimal. Our cover construction combines our CLT with the Shapley-Folkman
theorem and recent sparsification results for Laplacian matrices by Batson,
Spielman, and Srivastava. Our cover size lower bound is based on an algebraic
geometric construction. Finally, leveraging the structural properties of the
Fourier spectrum of PMDs we show that these distributions can be learned from
samples in -time, removing
the quasi-polynomial dependence of the running time on from the
algorithm of Daskalakis, Kamath, and Tzamos.Comment: To appear in STOC 201
Sur le nombre réel d'infections au COVID-19: effet de la sensibilité, de la spécificité et du nombre de tests sur la Estimation de la Prévalence
In this report, a formula for estimating the prevalence ratio of a disease in a population that is tested with imperfect tests is given. The formula is in terms of the fraction of positive test results and test parameters, i.e., probability of true positives (sensitivity) and the probability of true negatives (specificity). The motivation of this work arises in the context of the COVID-19 pandemic in which estimating the number of infected individuals depends on the sensitivity and specificity of the tests. In this context, it is shown that approximating the prevalence ratio by the ratio between the number of positive tests and the total number of tested individuals leads to dramatically high estimation errors, and thus, unadapted public health policies. The relevance of estimating the prevalence ratio using the formula presented in this work is that precision increases with the number of tests. Two conclusions are drawn from this work. First, in order to ensure that a reliable estimation is achieved with a finite number of tests, testing campaigns must be implemented with tests for which the sum of the sensitivity and the specificity is sufficiently different from one. Second, the key parameter for reducing the estimation error is the number of tests. For large number of tests, as long as the sum of the sensitivity and specificity is different from one, the exact values of these parameters have very little impact on the estimation error.Ce rapport prĂ©sente une formule mathĂ©matique pour estimer le nombre dâinfections SARS-CoV-2 dans une population donnĂ©e. La formule utilise les rĂ©sultats et les paramĂštres des tests, câest-Ă -dire la probabilitĂ© de vrais positifs (sensibilitĂ©) et de vrais nĂ©gatifs (spĂ©cificitĂ©). Selon la sensibilitĂ© et la spĂ©cificitĂ© des tests, le nombre de rĂ©sultats positifs peut ĂȘtre radicalement diffĂ©rent du nombre dâindividus infectĂ©s. Ainsi, le nombre final de rĂ©sultats rendus positifs nâest pas une source dâinformation fiable pour la prise de dĂ©cision ou lâĂ©laboration des directives.Deux conclusions sont tirĂ©es de ce travail; afin de garantir lâobtention dâune estimation fiable,des campagnes de tests doivent ĂȘtre mises en oeuvre avec des tests pour lesquels la somme de la sensibilitĂ© et de la spĂ©cificitĂ© est significativement diffĂ©rente de un. De plus, il est prouvĂ© quâun nombre important de tests conduit Ă une estimation plus prĂ©cise du nombre dâinfectĂ©s. Pour un grand nombre de tests, tant que la somme de la sensibilitĂ© et de la spĂ©cificitĂ© nâest pas Ă©gale Ă un, les valeurs exactes de ces paramĂštres ont trĂšs peu dâimpact sur lâerreur dâestimation