17 research outputs found

    Learning Discrete Distributions from Untrusted Batches

    Get PDF
    We consider the problem of learning a discrete distribution in the presence of an epsilon fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, p, and each data source provides a batch of >= k samples, with the guarantee that at least a (1 - epsilon) fraction of the sources draw their samples from a distribution with total variation distance at most eta from p. We make no assumptions on the data provided by the remaining epsilon fraction of sources--this data can even be chosen as an adversarial function of the (1 - epsilon) fraction of "good" batches. We provide two algorithms: one with runtime exponential in the support size, n, but polynomial in k, 1/epsilon and 1/eta that takes O((n + k)/epsilon^2) batches and recovers p to error O(eta + epsilon/sqrt(k)). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the eta = 0 setting and also achieves an O(epsilon/sqrt(k)) recover guarantee, though it runs in poly((nk)^k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing l_1 distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest

    Sample Efficient Identity Testing and Independence Testing of Quantum States

    Get PDF
    In this paper, we study the quantum identity testing problem, i.e., testing whether two given quantum states are identical, and quantum independence testing problem, i.e., testing whether a given multipartite quantum state is in tensor product form. For the quantum identity testing problem of D(Cd) system, we provide a deterministic measurement scheme that uses O(dε22) copies via independent measurements with d being the dimension of the state and ε being the additive error. For the independence testing problem D(Cd1 ⊗ Cd2 ⊗ · · · ⊗ Cdm) system, we show that the sample complexity is Θ(~ Πmi=1ε2di) via collective measurements, and O(Πmi=1ε2d2i) via independent measurements. If randomized choice of independent measurements are allowed, the sample complexity is Θ(d3ε2/2) for the quantum identity testing problem, and Θ(~ Πmi=1ε2d3 i/2) for the quantum independence testing problem
    corecore