1,534 research outputs found
Beyond Convexity: Stochastic Quasi-Convex Optimization
Stochastic convex optimization is a basic and well studied primitive in
machine learning. It is well known that convex and Lipschitz functions can be
minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized
Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which
updates according to the direction of the gradients, rather than the gradients
themselves. In this paper we analyze a stochastic version of NGD and prove its
convergence to a global minimum for a wider class of functions: we require the
functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens
the con- cept of unimodality to multidimensions and allows for certain types of
saddle points, which are a known hurdle for first-order optimization methods
such as gradient descent. Locally-Lipschitz functions are only required to be
Lipschitz in a small region around the optimum. This assumption circumvents
gradient explosion, which is another known hurdle for gradient descent
variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic
normalized gradient descent algorithm provably requires a minimal minibatch
size
On Graduated Optimization for Stochastic Non-Convex Problems
The graduated optimization approach, also known as the continuation method,
is a popular heuristic to solving non-convex problems that has received renewed
interest over the last decade. Despite its popularity, very little is known in
terms of theoretical convergence analysis. In this paper we describe a new
first-order algorithm based on graduated optimiza- tion and analyze its
performance. We characterize a parameterized family of non- convex functions
for which this algorithm provably converges to a global optimum. In particular,
we prove that the algorithm converges to an {\epsilon}-approximate solution
within O(1/\epsilon^2) gradient-based steps. We extend our algorithm and
analysis to the setting of stochastic non-convex optimization with noisy
gradient feedback, attaining the same convergence rate. Additionally, we
discuss the setting of zero-order optimization, and devise a a variant of our
algorithm which converges at rate of O(d^2/\epsilon^4).Comment: 17 page
Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates
In this paper, we provide a novel construction of the linear-sized spectral
sparsifiers of Batson, Spielman and Srivastava [BSS14]. While previous
constructions required running time [BSS14, Zou12], our
sparsification routine can be implemented in almost-quadratic running time
.
The fundamental conceptual novelty of our work is the leveraging of a strong
connection between sparsification and a regret minimization problem over
density matrices. This connection was known to provide an interpretation of the
randomized sparsifiers of Spielman and Srivastava [SS11] via the application of
matrix multiplicative weight updates (MWU) [CHS11, Vis14]. In this paper, we
explain how matrix MWU naturally arises as an instance of the
Follow-the-Regularized-Leader framework and generalize this approach to yield a
larger class of updates. This new class allows us to accelerate the
construction of linear-sized spectral sparsifiers, and give novel insights on
the motivation behind Batson, Spielman and Srivastava [BSS14]
Monte Carlo Simulations for Ghost Imaging Based on Scattered Photons
X-ray based imaging modalities are widely used in research, industry, and in
the medical field. Consequently, there is a strong motivation to improve their
performances with respect to resolution, dose, and contrast. Ghost imaging (GI)
is an imaging technique in which the images are reconstructed from measurements
with a single-pixel detector using correlation between the detected intensities
and the intensity structures of the input beam. The method that has been
recently extended to X-rays provides intriguing possibilities to overcome
several fundamental challenges of X-ray imaging. However, understanding the
potential of the method and designing X-ray GI systems pose challenges since in
addition to geometric optic effects, radiation-matter interactions must be
considered. Such considerations are fundamentally more complex than those at
longer wavelengths as relativistic effects such as Compton scattering become
significant. In this work we present a new method for designing and
implementing GI systems using the particle transport code FLUKA, that rely on
Monte Carlo (MC) sampling. This new approach enables comprehensive
consideration of the radiation-matter interactions, facilitating successful
planning of complex GI systems. As an example of an advanced imaging system, we
simulate a high-resolution scattered photons GI technique
Contextual Object Detection with a Few Relevant Neighbors
A natural way to improve the detection of objects is to consider the
contextual constraints imposed by the detection of additional objects in a
given scene. In this work, we exploit the spatial relations between objects in
order to improve detection capacity, as well as analyze various properties of
the contextual object detection problem. To precisely calculate context-based
probabilities of objects, we developed a model that examines the interactions
between objects in an exact probabilistic setting, in contrast to previous
methods that typically utilize approximations based on pairwise interactions.
Such a scheme is facilitated by the realistic assumption that the existence of
an object in any given location is influenced by only few informative locations
in space. Based on this assumption, we suggest a method for identifying these
relevant locations and integrating them into a mostly exact calculation of
probability based on their raw detector responses. This scheme is shown to
improve detection results and provides unique insights about the process of
contextual inference for object detection. We show that it is generally
difficult to learn that a particular object reduces the probability of another,
and that in cases when the context and detector strongly disagree this learning
becomes virtually impossible for the purposes of improving the results of an
object detector. Finally, we demonstrate improved detection results through use
of our approach as applied to the PASCAL VOC and COCO datasets
Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features
One-class support vector machine (OC-SVM) for a long time has been one of the
most effective anomaly detection methods and extensively adopted in both
research as well as industrial applications. The biggest issue for OC-SVM is
yet the capability to operate with large and high-dimensional datasets due to
optimization complexity. Those problems might be mitigated via dimensionality
reduction techniques such as manifold learning or autoencoder. However,
previous work often treats representation learning and anomaly prediction
separately. In this paper, we propose autoencoder based one-class support
vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier
features to approximate the radial basis kernel, into deep learning context by
combining it with a representation learning architecture and jointly exploit
stochastic gradient descent to obtain end-to-end training. Interestingly, this
also opens up the possible use of gradient-based attribution methods to explain
the decision making for anomaly detection, which has ever been challenging as a
result of the implicit mappings between the input space and the kernel space.
To the best of our knowledge, this is the first work to study the
interpretability of deep learning in anomaly detection. We evaluate our method
on a wide range of unsupervised anomaly detection tasks in which our end-to-end
training architecture achieves a performance significantly better than the
previous work using separate training.Comment: Accepted at European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML-PKDD) 201
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
- …