3,404 research outputs found

    Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

    Full text link
    Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the asymptotic rate of convergence and further implies near optimal asymptotic sample complexity. Numerical experiments are provided to support our analysis

    Faster Eigenvector Computation via Shift-and-Invert Preconditioning

    Full text link
    We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix Σ\Sigma -- i.e. computing a unit vector xx such that xTΣx(1ϵ)λ1(Σ)x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma): Offline Eigenvector Estimation: Given an explicit ARn×dA \in \mathbb{R}^{n \times d} with Σ=ATA\Sigma = A^TA, we show how to compute an ϵ\epsilon approximate top eigenvector in time O~([nnz(A)+dsr(A)gap2]log1/ϵ)\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon ) and O~([nnz(A)3/4(dsr(A))1/4gap]log1/ϵ)\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon ). Here nnz(A)nnz(A) is the number of nonzeros in AA, sr(A)sr(A) is the stable rank, gapgap is the relative eigengap. By separating the gapgap dependence from the nnz(A)nnz(A) term, our first runtime improves upon the classical power and Lanczos methods. It also improves prior work using fast subspace embeddings [AC09, CW13] and stochastic optimization [Sha15c], giving significantly better dependencies on sr(A)sr(A) and ϵ\epsilon. Our second running time improves these further when nnz(A)dsr(A)gap2nnz(A) \le \frac{d*sr(A)}{gap^2}. Online Eigenvector Estimation: Given a distribution DD with covariance matrix Σ\Sigma and a vector x0x_0 which is an O(gap)O(gap) approximate top eigenvector for Σ\Sigma, we show how to refine to an ϵ\epsilon approximation using O(var(D)gapϵ) O(\frac{var(D)}{gap*\epsilon}) samples from DD. Here var(D)var(D) is a natural notion of variance. Combining our algorithm with previous work to initialize x0x_0, we obtain improved sample complexity and runtime results under a variety of assumptions on DD. We achieve our results using a general framework that we believe is of independent interest. We give a robust analysis of the classic method of shift-and-invert preconditioning to reduce eigenvector computation to approximately solving a sequence of linear systems. We then apply fast stochastic variance reduced gradient (SVRG) based system solvers to achieve our claims.Comment: Appearing in ICML 2016. Combination of work in arXiv:1509.05647 and arXiv:1510.0889

    A trust-region method for stochastic variational inference with applications to streaming data

    Full text link
    Stochastic variational inference allows for fast posterior inference in complex Bayesian models. However, the algorithm is prone to local optima which can make the quality of the posterior approximation sensitive to the choice of hyperparameters and initialization. We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update. We show that this leads to generally better results and reduced sensitivity to hyperparameters. We also describe a new strategy for variational inference on streaming data and show that here our trust-region method is crucial for getting good performance.Comment: in Proceedings of the 32nd International Conference on Machine Learning, 201
    corecore