4 research outputs found

    Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

    Full text link
    Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)\mathcal{O}(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e−t)\mathcal{O}(e^{-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.Comment: NIPS 201

    Low Rank Directed Acyclic Graphs and Causal Structure Learning

    Full text link
    Despite several important advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In particular, the recent formulation of structure learning as a continuous optimization problem proved to have considerable advantages over the traditional combinatorial formulation, but the performance of the resulting algorithms is still wanting when the target graph is relatively large and dense. In this paper we propose a novel approach to mitigate this problem, by exploiting a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model. We establish several useful results relating interpretable graphical conditions to the low rank assumption, and show how to adapt existing methods for causal structure learning to take advantage of this assumption. We also provide empirical evidence for the utility of our low rank algorithms, especially on graphs that are not sparse. Not only do they outperform state-of-the-art algorithms when the low rank condition is satisfied, the performance on randomly generated scale-free graphs is also very competitive even though the true ranks may not be as low as is assumed

    Optimization Methods for Semi-Supervised Learning

    Get PDF
    The goal of this thesis is to provide efficient optimization algorithms for some semi-supervised learning (SSL) tasks in machine learning. For many machine learning tasks, training a classifier requires a large amount of labeled data; however, providing labels typically requires costly manual annotation. Fortunately, there is typically an abundance of unlabeled data that can be easily collected for many domains. In this thesis, we focus on problems where an underlying structure allows us to leverage the large amounts of unlabeled data, while only requiring small amounts of labeled data. In particular, we consider low-rank matrix completion problems with applications to recommender systems, and semi-supervised support vector machines (S3VM) to solve binary classification problems, such as digit recognition or disease classification. For the first class of problems, we study convex approximations to the low-rank matrix completion problem. Instead of restricting the solution space to low-rank matrices, we use the trace norm as a convex surrogate. Unfortunately, many trace norm minimization algorithms scale very poorly in practice since they require a full singular value decomposition (SVD) at each iteration. Recently, there has been renewed interest in the trace norm constrained problem utilizing the Frank-Wolfe algorithm, which only requires calculating the leading singular vector pair, providing an order of magnitude improvement on the iteration complexity. However, the Frank-Wolfe algorithm empirically has very slow convergence and in practice yields high-rank solutions, which greatly increases computational costs. To address this issue, we investigate a rank-drop step for Frank-Wolfe, which solves a subproblem specifically designed to decrease the rank of the iterate, ensuring that the Frank-Wolfe algorithm converges along a low-rank path. We show that this rank-drop subproblem can be decomposed into two cases, where each subproblem can be solved efficiently and we guarantee that the iterates remain feasible, preserving the projection-free property of Frank-Wolfe. Next we show that these ideas can be used to provide scalable algorithms for simultaneously sparse and low-rank matrix completion problems. We extend the Frank-Wolfe analysis to accommodate nonsmooth objectives, which can be used to solve the simultaneously sparse and low-rank problem. We replace the traditional linear approximation used in Frank-Wolfe by a uniform affine approximation to better address poor local approximations given by the first-order Taylor approximation. We show that this naturally leads to a sequence of smooth functions that uniformly converges to the original nonsmooth objective, allowing for a careful balance between approximation quality and convergence that is closely related to the step sizes of the Frank-Wolfe algorithm. We apply this algorithm to solve sparse covariance estimation problems, graph link prediction, and robust matrix completion problems. Finally, we propose a variant of self-training for the semi-supervised binary classification problem by leveraging ideas from S3VM. To address common issues associated with self-training, such as error propagation and label imbalances, we proposed an adaptive scheme using the functional margin of S3VM to construct a confidence measure. The confidence score is used to create rules to adapt the optimization problems to incorporate label uncertainty and class imbalances. Moreover, we show that the incremental training approach leverages warm-starts very well, leading to much faster training than standard S3VM methods alone, with much stronger empirical performance on imbalanced datasets
    corecore