143 research outputs found

    Supervised low-rank semi-nonnegative matrix factorization with frequency regularization for forecasting spatio-temporal data

    Full text link
    We propose a novel methodology for forecasting spatio-temporal data using supervised semi-nonnegative matrix factorization (SSNMF) with frequency regularization. Matrix factorization is employed to decompose spatio-temporal data into spatial and temporal components. To improve clarity in the temporal patterns, we introduce a nonnegativity constraint on the time domain along with regularization in the frequency domain. Specifically, regularization in the frequency domain involves selecting features in the frequency space, making an interpretation in the frequency domain more convenient. We propose two methods in the frequency domain: soft and hard regularizations, and provide convergence guarantees to first-order stationary points of the corresponding constrained optimization problem. While our primary motivation stems from geophysical data analysis based on GRACE (Gravity Recovery and Climate Experiment) data, our methodology has the potential for wider application. Consequently, when applying our methodology to GRACE data, we find that the results with the proposed methodology are comparable to previous research in the field of geophysical sciences but offer clearer interpretability.Comment: 34 page

    Optimization Methods for Semi-Supervised Learning

    Get PDF
    The goal of this thesis is to provide efficient optimization algorithms for some semi-supervised learning (SSL) tasks in machine learning. For many machine learning tasks, training a classifier requires a large amount of labeled data; however, providing labels typically requires costly manual annotation. Fortunately, there is typically an abundance of unlabeled data that can be easily collected for many domains. In this thesis, we focus on problems where an underlying structure allows us to leverage the large amounts of unlabeled data, while only requiring small amounts of labeled data. In particular, we consider low-rank matrix completion problems with applications to recommender systems, and semi-supervised support vector machines (S3VM) to solve binary classification problems, such as digit recognition or disease classification. For the first class of problems, we study convex approximations to the low-rank matrix completion problem. Instead of restricting the solution space to low-rank matrices, we use the trace norm as a convex surrogate. Unfortunately, many trace norm minimization algorithms scale very poorly in practice since they require a full singular value decomposition (SVD) at each iteration. Recently, there has been renewed interest in the trace norm constrained problem utilizing the Frank-Wolfe algorithm, which only requires calculating the leading singular vector pair, providing an order of magnitude improvement on the iteration complexity. However, the Frank-Wolfe algorithm empirically has very slow convergence and in practice yields high-rank solutions, which greatly increases computational costs. To address this issue, we investigate a rank-drop step for Frank-Wolfe, which solves a subproblem specifically designed to decrease the rank of the iterate, ensuring that the Frank-Wolfe algorithm converges along a low-rank path. We show that this rank-drop subproblem can be decomposed into two cases, where each subproblem can be solved efficiently and we guarantee that the iterates remain feasible, preserving the projection-free property of Frank-Wolfe. Next we show that these ideas can be used to provide scalable algorithms for simultaneously sparse and low-rank matrix completion problems. We extend the Frank-Wolfe analysis to accommodate nonsmooth objectives, which can be used to solve the simultaneously sparse and low-rank problem. We replace the traditional linear approximation used in Frank-Wolfe by a uniform affine approximation to better address poor local approximations given by the first-order Taylor approximation. We show that this naturally leads to a sequence of smooth functions that uniformly converges to the original nonsmooth objective, allowing for a careful balance between approximation quality and convergence that is closely related to the step sizes of the Frank-Wolfe algorithm. We apply this algorithm to solve sparse covariance estimation problems, graph link prediction, and robust matrix completion problems. Finally, we propose a variant of self-training for the semi-supervised binary classification problem by leveraging ideas from S3VM. To address common issues associated with self-training, such as error propagation and label imbalances, we proposed an adaptive scheme using the functional margin of S3VM to construct a confidence measure. The confidence score is used to create rules to adapt the optimization problems to incorporate label uncertainty and class imbalances. Moreover, we show that the incremental training approach leverages warm-starts very well, leading to much faster training than standard S3VM methods alone, with much stronger empirical performance on imbalanced datasets

    Dictionary optimization for representing sparse signals using Rank-One Atom Decomposition (ROAD)

    Get PDF
    Dictionary learning has attracted growing research interest during recent years. As it is a bilinear inverse problem, one typical way to address this problem is to iteratively alternate between two stages: sparse coding and dictionary update. The general principle of the alternating approach is to fix one variable and optimize the other one. Unfortunately, for the alternating method, an ill-conditioned dictionary in the training process may not only introduce numerical instability but also trap the overall training process towards a singular point. Moreover, it leads to difficulty in analyzing its convergence, and few dictionary learning algorithms have been proved to have global convergence. For the other bilinear inverse problems, such as short-and-sparse deconvolution (SaSD) and convolutional dictionary learning (CDL), the alternating method is still a popular choice. As these bilinear inverse problems are also ill-posed and complicated, they are tricky to handle. Additional inner iterative methods are usually required for both of the updating stages, which aggravates the difficulty of analyzing the convergence of the whole learning process. It is also challenging to determine the number of iterations for each stage, as over-tuning any stage will trap the whole process into a local minimum that is far from the ground truth. To mitigate the issues resulting from the alternating method, this thesis proposes a novel algorithm termed rank-one atom decomposition (ROAD), which intends to recast a bilinear inverse problem into an optimization problem with respect to a single variable, that is, a set of rank-one matrices. Therefore, the resulting algorithm is one stage, which minimizes the sparsity of the coefficients while keeping the data consistency constraint throughout the whole learning process. Inspired by recent advances in applying the alternating direction method of multipliers (ADMM) to nonconvex nonsmooth problems, an ADMM solver is adopted to address ROAD problems, and a lower bound of the penalty parameter is derived to guarantee a convergence in the augmented Lagrangian despite nonconvexity of the optimization formulation. Compared to two-stage dictionary learning methods, ROAD simplifies the learning process, eases the difficulty of analyzing convergence, and avoids the singular point issue. From a practical point of view, ROAD reduces the number of tuning parameters required in other benchmark algorithms. Numerical tests reveal that ROAD outperforms other benchmark algorithms in both synthetic data tests and single image super-resolution applications. In addition to dictionary learning, the ROAD formulation can also be extended to solve the SaSD and CDL problems. ROAD can still be employed to recast these problems into a one-variable optimization problem. Numerical tests illustrate that ROAD has better performance in estimating convolutional kernels compared to the latest SaSD and CDL algorithms.Open Acces

    Proximal Methods for Hierarchical Sparse Coding

    Get PDF
    Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced tree-structured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the tree-structured sparse approximation problem at the same computational cost as traditional ones using the L1-norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models
    • …
    corecore