Search CORE

30 research outputs found

Riemannian Langevin Algorithm for Solving Semidefinite Programs

Author: Erdogdu Murat A.
Li Mufan Bill
Publication venue
Publication date: 22/12/2020
Field of study

We propose a Langevin diffusion-based algorithm for non-convex optimization and sampling on a product manifold of spheres. Under a logarithmic Sobolev inequality, we establish a guarantee for finite iteration convergence to the Gibbs distribution in terms of Kullback--Leibler divergence. We show that with an appropriate temperature choice, the suboptimality gap to the global minimum is guaranteed to be arbitrarily small with high probability. As an application, we consider the Burer--Monteiro approach for solving a semidefinite program (SDP) with diagonal constraints, and analyze the proposed Langevin algorithm for optimizing the non-convex objective. In particular, we establish a logarithmic Sobolev inequality for the Burer--Monteiro problem when there are no spurious local minima, but under the presence saddle points. Combining the results, we then provide a global optimality guarantee for the SDP and the Max-Cut problem. More precisely, we show that the Langevin algorithm achieves

\epsilon

accuracy with high probability in

\widetilde{\Omega}( \epsilon^{-5} )

iterations

arXiv.org e-Print Archive

Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$ -Covers

Author: Erdogdu Murat A.
Park Sejun
Şimşekli Umut
Publication venue
Publication date: 19/09/2022
Field of study

In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with

P

pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by

O(\sqrt{(\log n\log(nP))/n})

, where

n

is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and

K

-means clustering for both hard and soft label setups, improving the known state-of-the-art rates

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

Author: Erdogdu Murat A.
He Hera Y.
Leskovec Jure
Rajaraman Anand
Zhao Qingyuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2015
Field of study

Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.Comment: 10 pages, published in KDD 201

arXiv.org e-Print Archive

CiteSeerX