25 research outputs found
On the linear convergence of the stochastic gradient method with constant step-size
The strong growth condition (SGC) is known to be a sufficient condition for
linear convergence of the stochastic gradient method using a constant step-size
(SGM-CS). In this paper, we provide a necessary condition, for the
linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this
necessary is violated up to a additive perturbation , we show that both
the projected stochastic gradient method using a constant step-size (PSGM-CS)
and the proximal stochastic gradient method exhibit linear convergence to a
noise dominated region, whose distance to the optimal solution is proportional
to
A Parallel SGD method with Strong Convergence
Abstract This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each iteration of a batch descent method. The method has strong convergence properties. Experiments on datasets with high dimensional feature spaces show the value of this method. Introduction. We are interested in the large scale learning of linear classifiers. Let {x i , y i } be the training set associated with a binary classification problem (y i ∈ {1, −1}). Consider a linear classification model, y = sgn(w T x). Let l(w · x i , y i ) be a continuously differentiable, non-negative, convex loss function that has Lipschitz continuous gradient. This allows us to consider loss functions such as least squares, logistic loss and squared hinge loss. Hinge loss is not covered by our theory since it is non-differentiable. Our aim is to to minimize the regularized risk functional f (w)
Structure and Dynamics of Information Pathways in Online Media
Diffusion of information, spread of rumors and infectious diseases are all
instances of stochastic processes that occur over the edges of an underlying
network. Many times networks over which contagions spread are unobserved, and
such networks are often dynamic and change over time. In this paper, we
investigate the problem of inferring dynamic networks based on information
diffusion data. We assume there is an unobserved dynamic network that changes
over time, while we observe the results of a dynamic process spreading over the
edges of the network. The task then is to infer the edges and the dynamics of
the underlying network.
We develop an on-line algorithm that relies on stochastic convex optimization
to efficiently solve the dynamic network inference problem. We apply our
algorithm to information diffusion among 3.3 million mainstream media and blog
sites and experiment with more than 179 million different pieces of information
spreading over the network in a one year period. We study the evolution of
information pathways in the online media space and find interesting insights.
Information pathways for general recurrent topics are more stable across time
than for on-going news events. Clusters of news media sites and blogs often
emerge and vanish in matter of days for on-going news events. Major social
movements and events involving civil population, such as the Libyan's civil war
or Syria's uprise, lead to an increased amount of information pathways among
blogs as well as in the overall increase in the network centrality of blogs and
social media sites.Comment: To Appear at the 6th International Conference on Web Search and Data
Mining (WSDM '13
An Approximate Shapley-Folkman Theorem
The Shapley-Folkman theorem shows that Minkowski averages of uniformly
bounded sets tend to be convex when the number of terms in the sum becomes much
larger than the ambient dimension. In optimization, Aubin and Ekeland [1976]
show that this produces an a priori bound on the duality gap of separable
nonconvex optimization problems involving finite sums. This bound is highly
conservative and depends on unstable quantities, and we relax it in several
directions to show that non convexity can have a much milder impact on finite
sum minimization problems such as empirical risk minimization and multi-task
classification. As a byproduct, we show a new version of Maurey's classical
approximate Carath\'eodory lemma where we sample a significant fraction of the
coefficients, without replacement, as well as a result on sampling constraints
using an approximate Helly theorem, both of independent interest.Comment: Added constraint sampling result, simplified sampling results,
reformat, et
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
We apply stochastic average gradient (SAG) algorithms for training
conditional random fields (CRFs). We describe a practical implementation that
uses structure in the CRF gradient to reduce the memory requirement of this
linearly-convergent stochastic gradient method, propose a non-uniform sampling
scheme that substantially improves practical performance, and analyze the rate
of convergence of the SAGA variant under non-uniform sampling. Our experimental
results reveal that our method often significantly outperforms existing methods
in terms of the training objective, and performs as well or better than
optimally-tuned stochastic gradient methods in terms of test error.Comment: AI/Stats 2015, 24 page