8,603 research outputs found
Surrogate Optimization for p-Norms
In this paper, we study the effect of surrogate objective functions in optimization problems. We introduce surrogate ratio as a measure of such effect, where the surrogate ratio is the ratio between the optimal values of the original and surrogate objective functions.
We prove that the surrogate ratio is at most mu^{|1/p - 1/q|} when the objective functions are p- and q-norms, and the feasible region is a mu-dimensional space (i.e., a subspace of R^mu), a mu-intersection of matroids, or a mu-extendible system. We also show that this is the best possible bound. In addition, for mu-systems, we demonstrate that the ratio becomes mu^{1/p} when p q. Here, a mu-system is an independence system such that for any subset of ground set the ratio of the cardinality of the largest to the smallest maximal independent subset of it is at most mu. We further extend our results to the surrogate ratios for approximate solutions
Double Greedy Algorithms: Reduced Basis Methods for Transport Dominated Problems
The central objective of this paper is to develop reduced basis methods for
parameter dependent transport dominated problems that are rigorously proven to
exhibit rate-optimal performance when compared with the Kolmogorov -widths
of the solution sets. The central ingredient is the construction of
computationally feasible "tight" surrogates which in turn are based on deriving
a suitable well-conditioned variational formulation for the parameter dependent
problem. The theoretical results are illustrated by numerical experiments for
convection-diffusion and pure transport equations. In particular, the latter
example sheds some light on the smoothness of the dependence of the solutions
on the parameters
Clustering of Data with Missing Entries
The analysis of large datasets is often complicated by the presence of
missing entries, mainly because most of the current machine learning algorithms
are designed to work with full data. The main focus of this work is to
introduce a clustering algorithm, that will provide good clustering even in the
presence of missing data. The proposed technique solves an fusion
penalty based optimization problem to recover the clusters. We theoretically
analyze the conditions needed for the successful recovery of the clusters. We
also propose an algorithm to solve a relaxation of this problem using
saturating non-convex fusion penalties. The method is demonstrated on simulated
and real datasets, and is observed to perform well in the presence of large
fractions of missing entries.Comment: arXiv admin note: substantial text overlap with arXiv:1709.0187
Robust Low-Rank Subspace Segmentation with Semidefinite Guarantees
Recently there is a line of research work proposing to employ Spectral
Clustering (SC) to segment (group){Throughout the paper, we use segmentation,
clustering, and grouping, and their verb forms, interchangeably.}
high-dimensional structural data such as those (approximately) lying on
subspaces {We follow {liu2010robust} and use the term "subspace" to denote both
linear subspaces and affine subspaces. There is a trivial conversion between
linear subspaces and affine subspaces as mentioned therein.} or low-dimensional
manifolds. By learning the affinity matrix in the form of sparse
reconstruction, techniques proposed in this vein often considerably boost the
performance in subspace settings where traditional SC can fail. Despite the
success, there are fundamental problems that have been left unsolved: the
spectrum property of the learned affinity matrix cannot be gauged in advance,
and there is often one ugly symmetrization step that post-processes the
affinity for SC input. Hence we advocate to enforce the symmetric positive
semidefinite constraint explicitly during learning (Low-Rank Representation
with Positive SemiDefinite constraint, or LRR-PSD), and show that factually it
can be solved in an exquisite scheme efficiently instead of general-purpose SDP
solvers that usually scale up poorly. We provide rigorous mathematical
derivations to show that, in its canonical form, LRR-PSD is equivalent to the
recently proposed Low-Rank Representation (LRR) scheme {liu2010robust}, and
hence offer theoretic and practical insights to both LRR-PSD and LRR, inviting
future research. As per the computational cost, our proposal is at most
comparable to that of LRR, if not less. We validate our theoretic analysis and
optimization scheme by experiments on both synthetic and real data sets.Comment: 10 pages, 4 figures. Accepted by ICDM Workshop on Optimization Based
Methods for Emerging Data Mining Problems (OEDM), 2010. Main proof simplified
and typos corrected. Experimental data slightly adde
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
- …