8,603 research outputs found

    Surrogate Optimization for p-Norms

    Get PDF
    In this paper, we study the effect of surrogate objective functions in optimization problems. We introduce surrogate ratio as a measure of such effect, where the surrogate ratio is the ratio between the optimal values of the original and surrogate objective functions. We prove that the surrogate ratio is at most mu^{|1/p - 1/q|} when the objective functions are p- and q-norms, and the feasible region is a mu-dimensional space (i.e., a subspace of R^mu), a mu-intersection of matroids, or a mu-extendible system. We also show that this is the best possible bound. In addition, for mu-systems, we demonstrate that the ratio becomes mu^{1/p} when p q. Here, a mu-system is an independence system such that for any subset of ground set the ratio of the cardinality of the largest to the smallest maximal independent subset of it is at most mu. We further extend our results to the surrogate ratios for approximate solutions

    Double Greedy Algorithms: Reduced Basis Methods for Transport Dominated Problems

    Get PDF
    The central objective of this paper is to develop reduced basis methods for parameter dependent transport dominated problems that are rigorously proven to exhibit rate-optimal performance when compared with the Kolmogorov nn-widths of the solution sets. The central ingredient is the construction of computationally feasible "tight" surrogates which in turn are based on deriving a suitable well-conditioned variational formulation for the parameter dependent problem. The theoretical results are illustrated by numerical experiments for convection-diffusion and pure transport equations. In particular, the latter example sheds some light on the smoothness of the dependence of the solutions on the parameters

    Clustering of Data with Missing Entries

    Full text link
    The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an 0\ell_0 fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries.Comment: arXiv admin note: substantial text overlap with arXiv:1709.0187

    Robust Low-Rank Subspace Segmentation with Semidefinite Guarantees

    Full text link
    Recently there is a line of research work proposing to employ Spectral Clustering (SC) to segment (group){Throughout the paper, we use segmentation, clustering, and grouping, and their verb forms, interchangeably.} high-dimensional structural data such as those (approximately) lying on subspaces {We follow {liu2010robust} and use the term "subspace" to denote both linear subspaces and affine subspaces. There is a trivial conversion between linear subspaces and affine subspaces as mentioned therein.} or low-dimensional manifolds. By learning the affinity matrix in the form of sparse reconstruction, techniques proposed in this vein often considerably boost the performance in subspace settings where traditional SC can fail. Despite the success, there are fundamental problems that have been left unsolved: the spectrum property of the learned affinity matrix cannot be gauged in advance, and there is often one ugly symmetrization step that post-processes the affinity for SC input. Hence we advocate to enforce the symmetric positive semidefinite constraint explicitly during learning (Low-Rank Representation with Positive SemiDefinite constraint, or LRR-PSD), and show that factually it can be solved in an exquisite scheme efficiently instead of general-purpose SDP solvers that usually scale up poorly. We provide rigorous mathematical derivations to show that, in its canonical form, LRR-PSD is equivalent to the recently proposed Low-Rank Representation (LRR) scheme {liu2010robust}, and hence offer theoretic and practical insights to both LRR-PSD and LRR, inviting future research. As per the computational cost, our proposal is at most comparable to that of LRR, if not less. We validate our theoretic analysis and optimization scheme by experiments on both synthetic and real data sets.Comment: 10 pages, 4 figures. Accepted by ICDM Workshop on Optimization Based Methods for Emerging Data Mining Problems (OEDM), 2010. Main proof simplified and typos corrected. Experimental data slightly adde

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System
    corecore