119,704 research outputs found

    A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

    Full text link
    The CUR matrix decomposition is an important extension of Nystr\"{o}m approximation to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate significant improvement over the existing relative-error algorithms.Comment: accepted by NIPS 201

    Efficient Algorithms and Error Analysis for the Modified Nystrom Method

    Full text link
    Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computational challenge, the Nystr\"om method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nystr\"om method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nystr\"om method called the modified Nystr\"om method has demonstrated significant improvement over the standard Nystr\"om method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nystr\"om method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the state-of-the-art algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nystr\"om method is exact under certain conditions, and we establish a lower error bound for the modified Nystr\"om method.Comment: 9-page paper plus appendix. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. JMLR: W&CP volume 3

    Optimal CUR Matrix Decompositions

    Full text link
    The CUR decomposition of an m×nm \times n matrix AA finds an m×cm \times c matrix CC with a subset of c<nc < n columns of A,A, together with an r×nr \times n matrix RR with a subset of r<mr < m rows of A,A, as well as a c×rc \times r low-rank matrix UU such that the matrix CURC U R approximates the matrix A,A, that is, ∣∣A−CUR∣∣F2≤(1+ϵ)∣∣A−Ak∣∣F2 || A - CUR ||_F^2 \le (1+\epsilon) || A - A_k||_F^2, where ∣∣.∣∣F||.||_F denotes the Frobenius norm and AkA_k is the best m×nm \times n matrix of rank kk constructed via the SVD. We present input-sparsity-time and deterministic algorithms for constructing such a CUR decomposition where c=O(k/ϵ)c=O(k/\epsilon) and r=O(k/ϵ)r=O(k/\epsilon) and rank(U)=k(U) = k. Up to constant factors, our algorithms are simultaneously optimal in c,r,c, r, and rank(U)(U).Comment: small revision in lemma 4.

    On sparse representations of linear operators and the approximation of matrix products

    Full text link
    Thus far, sparse representations have been exploited largely in the context of robustly estimating functions in a noisy environment from a few measurements. In this context, the existence of a basis in which the signal class under consideration is sparse is used to decrease the number of necessary measurements while controlling the approximation error. In this paper, we instead focus on applications in numerical analysis, by way of sparse representations of linear operators with the objective of minimizing the number of operations needed to perform basic operations (here, multiplication) on these operators. We represent a linear operator by a sum of rank-one operators, and show how a sparse representation that guarantees a low approximation error for the product can be obtained from analyzing an induced quadratic form. This construction in turn yields new algorithms for computing approximate matrix products.Comment: 6 pages, 3 figures; presented at the 42nd Annual Conference on Information Sciences and Systems (CISS 2008

    Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

    Full text link
    Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low rank approximations, yet may fall short of producing descriptive data sketches, especially when the cluster centers are linearly dependent. Motivated by that, this paper introduces a novel randomized column sampling tool dubbed Spatial Random Sampling (SRS), in which data points are sampled based on their proximity to randomly sampled points on the unit sphere. The most compelling feature of SRS is that the corresponding probability of sampling from a given data cluster is proportional to the surface area the cluster occupies on the unit sphere, independently from the size of the cluster population. Although it is fully randomized, SRS is shown to provide descriptive and balanced data representations. The proposed idea addresses a pressing need in data science and holds potential to inspire many novel approaches for analysis of big data
    • …
    corecore