30 research outputs found
Riemannian Langevin Algorithm for Solving Semidefinite Programs
We propose a Langevin diffusion-based algorithm for non-convex optimization
and sampling on a product manifold of spheres. Under a logarithmic Sobolev
inequality, we establish a guarantee for finite iteration convergence to the
Gibbs distribution in terms of Kullback--Leibler divergence. We show that with
an appropriate temperature choice, the suboptimality gap to the global minimum
is guaranteed to be arbitrarily small with high probability.
As an application, we consider the Burer--Monteiro approach for solving a
semidefinite program (SDP) with diagonal constraints, and analyze the proposed
Langevin algorithm for optimizing the non-convex objective. In particular, we
establish a logarithmic Sobolev inequality for the Burer--Monteiro problem when
there are no spurious local minima, but under the presence saddle points.
Combining the results, we then provide a global optimality guarantee for the
SDP and the Max-Cut problem. More precisely, we show that the Langevin
algorithm achieves accuracy with high probability in
iterations
Generalization Bounds for Stochastic Gradient Descent via Localized -Covers
In this paper, we propose a new covering technique localized for the
trajectories of SGD. This localization provides an algorithm-specific
complexity measured by the covering number, which can have
dimension-independent cardinality in contrast to standard uniform covering
arguments that result in exponential dimension dependency. Based on this
localized construction, we show that if the objective function is a finite
perturbation of a piecewise strongly convex and smooth function with
pieces, i.e. non-convex and non-smooth in general, the generalization error can
be upper bounded by , where is the number of
data samples. In particular, this rate is independent of dimension and does not
require early stopping and decaying step size. Finally, we employ these results
in various contexts and derive generalization bounds for multi-index linear
models, multi-class support vector machines, and -means clustering for both
hard and soft label setups, improving the known state-of-the-art rates
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201