352 research outputs found

    Sketch-based Randomized Algorithms for Dynamic Graph Regression

    Full text link
    A well-known problem in data science and machine learning is {\em linear regression}, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression problem require at least a linear time (in terms of nn: the size of the graph). However, this time complexity might be intractable in practice. In the current paper, we utilize {\em subsampled randomized Hadamard transform} and \textsf{CountSketch} to propose the first randomized algorithms. Suppose that we are given an n×mn\times m matrix embedding MM of the graph, where mâ‰Șnm \ll n. Let rr be the number of samples required for a guaranteed approximation error, which is a sublinear function of nn. Our first algorithm reduces time complexity of pre-processing to O(n(m+1)+2n(m+1)log⁥2(r+1)+rm2)O(n(m + 1) + 2n(m + 1) \log_2(r + 1) + rm^2). Then after an edge insertion or an edge deletion, it updates the approximate solution in O(rm)O(rm) time. Our second algorithm reduces time complexity of pre-processing to O(nnz(M)+m3ϔ−2log⁥7(m/Ï”))O \left( nnz(M) + m^3 \epsilon^{-2} \log^7(m/\epsilon) \right), where nnz(M)nnz(M) is the number of nonzero elements of MM. Then after an edge insertion or an edge deletion or a node insertion or a node deletion, it updates the approximate solution in O(qm)O(qm) time, with q=O(m2Ï”2log⁥6(m/Ï”))q=O\left(\frac{m^2}{\epsilon^2} \log^6(m/\epsilon) \right). Finally, we show that under some assumptions, if ln⁥n<ϔ−1\ln n < \epsilon^{-1} our first algorithm outperforms our second algorithm and if ln⁥n≄ϔ−1\ln n \geq \epsilon^{-1} our second algorithm outperforms our first algorithm

    Improved analysis of the subsampled randomized Hadamard transform

    Get PDF
    This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue "Sparse Representation of Data and Images." v2--v4 include minor correction

    Randomized Dynamic Mode Decomposition

    Full text link
    This paper presents a randomized algorithm for computing the near-optimal low-rank dynamic mode decomposition (DMD). Randomized algorithms are emerging techniques to compute low-rank matrix approximations at a fraction of the cost of deterministic algorithms, easing the computational challenges arising in the area of `big data'. The idea is to derive a small matrix from the high-dimensional data, which is then used to efficiently compute the dynamic modes and eigenvalues. The algorithm is presented in a modular probabilistic framework, and the approximation quality can be controlled via oversampling and power iterations. The effectiveness of the resulting randomized DMD algorithm is demonstrated on several benchmark examples of increasing complexity, providing an accurate and efficient approach to extract spatiotemporal coherent structures from big data in a framework that scales with the intrinsic rank of the data, rather than the ambient measurement dimension. For this work we assume that the dynamics of the problem under consideration is evolving on a low-dimensional subspace that is well characterized by a fast decaying singular value spectrum

    Optimal approximate matrix product in terms of stable rank

    Get PDF
    We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having m=O(r~/Δ2)m = O(\tilde{r}/\varepsilon^2) rows. Here r~\tilde{r} is the maximum stable rank, i.e. squared ratio of Frobenius and operator norms, of the two matrices being multiplied. This is a quantitative improvement over previous work of [MZ11, KVZ14], and is also optimal for any oblivious dimensionality-reducing map. Furthermore, due to the black box reliance on the subspace embedding property in our proofs, our theorem can be applied to a much more general class of sketching matrices than what was known before, in addition to achieving better bounds. For example, one can apply our theorem to efficient subspace embeddings such as the Subsampled Randomized Hadamard Transform or sparse subspace embeddings, or even with subspace embedding constructions that may be developed in the future. Our main theorem, via connections with spectral error matrix multiplication shown in prior work, implies quantitative improvements for approximate least squares regression and low rank approximation. Our main result has also already been applied to improve dimensionality reduction guarantees for kk-means clustering [CEMMP14], and implies new results for nonparametric regression [YPW15]. We also separately point out that the proof of the "BSS" deterministic row-sampling result of [BSS12] can be modified to show that for any matrices A,BA, B of stable rank at most r~\tilde{r}, one can achieve the spectral norm guarantee for approximate matrix multiplication of ATBA^T B by deterministically sampling O(r~/Δ2)O(\tilde{r}/\varepsilon^2) rows that can be found in polynomial time. The original result of [BSS12] was for rank instead of stable rank. Our observation leads to a stronger version of a main theorem of [KMST10].Comment: v3: minor edits; v2: fixed one step in proof of Theorem 9 which was wrong by a constant factor (see the new Lemma 5 and its use; final theorem unaffected
    • 

    corecore