1,751 research outputs found

    Approximate Sparse Recovery: Optimizing Time and Measurements

    Full text link
    An approximate sparse recovery system consists of parameters k,Nk,N, an mm-by-NN measurement matrix, Φ\Phi, and a decoding algorithm, D\mathcal{D}. Given a vector, xx, the system approximates xx by x^=D(Φx)\widehat x =\mathcal{D}(\Phi x), which must satisfy x^x2Cxxk2\| \widehat x - x\|_2\le C \|x - x_k\|_2, where xkx_k denotes the optimal kk-term approximation to xx. For each vector xx, the system must succeed with probability at least 3/4. Among the goals in designing such systems are minimizing the number mm of measurements and the runtime of the decoding algorithm, D\mathcal{D}. In this paper, we give a system with m=O(klog(N/k))m=O(k \log(N/k)) measurements--matching a lower bound, up to a constant factor--and decoding time O(klogcN)O(k\log^c N), matching a lower bound up to log(N)\log(N) factors. We also consider the encode time (i.e., the time to multiply Φ\Phi by xx), the time to update measurements (i.e., the time to multiply Φ\Phi by a 1-sparse xx), and the robustness and stability of the algorithm (adding noise before and after the measurements). Our encode and update times are optimal up to log(N)\log(N) factors

    Recovering Structured Probability Matrices

    Get PDF
    We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs

    Dynamic Graph Stream Algorithms in o(n)o(n) Space

    Get PDF
    In this paper we study graph problems in dynamic streaming model, where the input is defined by a sequence of edge insertions and deletions. As many natural problems require Ω(n)\Omega(n) space, where nn is the number of vertices, existing works mainly focused on designing O~(n)\tilde{O}(n) space algorithms. Although sublinear in the number of edges for dense graphs, it could still be too large for many applications (e.g. nn is huge or the graph is sparse). In this work, we give single-pass algorithms beating this space barrier for two classes of problems. We present o(n)o(n) space algorithms for estimating the number of connected components with additive error εn\varepsilon n and (1+ε)(1+\varepsilon)-approximating the weight of minimum spanning tree, for any small constant ε>0\varepsilon>0. The latter improves previous O~(n)\tilde{O}(n) space algorithm given by Ahn et al. (SODA 2012) for connected graphs with bounded edge weights. We initiate the study of approximate graph property testing in the dynamic streaming model, where we want to distinguish graphs satisfying the property from graphs that are ε\varepsilon-far from having the property. We consider the problem of testing kk-edge connectivity, kk-vertex connectivity, cycle-freeness and bipartiteness (of planar graphs), for which, we provide algorithms using roughly O~(n1ε)\tilde{O}(n^{1-\varepsilon}) space, which is o(n)o(n) for any constant ε\varepsilon. To complement our algorithms, we present Ω(n1O(ε))\Omega(n^{1-O(\varepsilon)}) space lower bounds for these problems, which show that such a dependence on ε\varepsilon is necessary.Comment: ICALP 201
    corecore