2,781 research outputs found
Invariant set of weight of perceptron trained by perceptron training algorithm
In this paper, an invariant set of the weight of the perceptron trained by the perceptron training algorithm is defined and characterized. The dynamic range of the steady state values of the weight of the perceptron can be evaluated via finding the dynamic range of the weight of the perceptron inside the largest invariant set. Also, the necessary and sufficient condition for the forward dynamics of the weight of the perceptron to be injective as well as the condition for the invariant set of the weight of the perceptron to be attractive is derived
Beyond Convexity: Stochastic Quasi-Convex Optimization
Stochastic convex optimization is a basic and well studied primitive in
machine learning. It is well known that convex and Lipschitz functions can be
minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized
Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which
updates according to the direction of the gradients, rather than the gradients
themselves. In this paper we analyze a stochastic version of NGD and prove its
convergence to a global minimum for a wider class of functions: we require the
functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens
the con- cept of unimodality to multidimensions and allows for certain types of
saddle points, which are a known hurdle for first-order optimization methods
such as gradient descent. Locally-Lipschitz functions are only required to be
Lipschitz in a small region around the optimum. This assumption circumvents
gradient explosion, which is another known hurdle for gradient descent
variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic
normalized gradient descent algorithm provably requires a minimal minibatch
size
Surrogate Functions for Maximizing Precision at the Top
The problem of maximizing precision at the top of a ranked list, often dubbed
Precision@k (prec@k), finds relevance in myriad learning applications such as
ranking, multi-label classification, and learning with severe label imbalance.
However, despite its popularity, there exist significant gaps in our
understanding of this problem and its associated performance measure.
The most notable of these is the lack of a convex upper bounding surrogate
for prec@k. We also lack scalable perceptron and stochastic gradient descent
algorithms for optimizing this performance measure. In this paper we make key
contributions in these directions. At the heart of our results is a family of
truly upper bounding surrogates for prec@k. These surrogates are motivated in a
principled manner and enjoy attractive properties such as consistency to prec@k
under various natural margin/noise conditions.
These surrogates are then used to design a class of novel perceptron
algorithms for optimizing prec@k with provable mistake bounds. We also devise
scalable stochastic gradient descent style methods for this problem with
provable convergence bounds. Our proofs rely on novel uniform convergence
bounds which require an in-depth analysis of the structural properties of
prec@k and its surrogates. We conclude with experimental results comparing our
algorithms with state-of-the-art cutting plane and stochastic gradient
algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference
on Machine Learning (ICML 2015
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Stochastic gradient descent algorithms for training linear and kernel
predictors are gaining more and more importance, thanks to their scalability.
While various methods have been proposed to speed up their convergence, the
model selection phase is often ignored. In fact, in theoretical works most of
the time assumptions are made, for example, on the prior knowledge of the norm
of the optimal solution, while in the practical world validation methods remain
the only viable approach. In this paper, we propose a new kernel-based
stochastic gradient descent algorithm that performs model selection while
training, with no parameters to tune, nor any form of cross-validation. The
algorithm builds on recent advancement in online learning theory for
unconstrained settings, to estimate over time the right regularization in a
data-dependent way. Optimal rates of convergence are proved under standard
smoothness assumptions on the target function, using the range space of the
fractional integral operator associated with the kernel
- …