533 research outputs found
Stochastic Optimization with Importance Sampling
Uniform sampling of training data has been commonly used in traditional
stochastic optimization algorithms such as Proximal Stochastic Gradient Descent
(prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although
uniform sampling can guarantee that the sampled stochastic quantity is an
unbiased estimate of the corresponding true quantity, the resulting estimator
may have a rather high variance, which negatively affects the convergence of
the underlying optimization procedure. In this paper we study stochastic
optimization with importance sampling, which improves the convergence rate by
reducing the stochastic variance. Specifically, we study prox-SGD (actually,
stochastic mirror descent) with importance sampling and prox-SDCA with
importance sampling. For prox-SGD, instead of adopting uniform sampling
throughout the training process, the proposed algorithm employs importance
sampling to minimize the variance of the stochastic gradient. For prox-SDCA,
the proposed importance sampling scheme aims to achieve higher expected dual
value at each dual coordinate ascent step. We provide extensive theoretical
analysis to show that the convergence rates with the proposed importance
sampling methods can be significantly improved under suitable conditions both
for prox-SGD and for prox-SDCA. Experiments are provided to verify the
theoretical analysis.Comment: 29 page
Active Learning with Expert Advice
Conventional learning with expert advice methods assumes a learner is always
receiving the outcome (e.g., class labels) of every incoming training instance
at the end of each trial. In real applications, acquiring the outcome from
oracle can be costly or time consuming. In this paper, we address a new problem
of active learning with expert advice, where the outcome of an instance is
disclosed only when it is requested by the online learner. Our goal is to learn
an accurate prediction model by asking the oracle the number of questions as
small as possible. To address this challenge, we propose a framework of active
forecasters for online active learning with expert advice, which attempts to
extend two regular forecasters, i.e., Exponentially Weighted Average Forecaster
and Greedy Forecaster, to tackle the task of active learning with expert
advice. We prove that the proposed algorithms satisfy the Hannan consistency
under some proper assumptions, and validate the efficacy of our technique by an
extensive set of experiments.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Parallel Graph Connectivity in Log Diameter Rounds
We study graph connectivity problem in MPC model. On an undirected graph with
nodes and edges, round connectivity algorithms have been
known for over 35 years. However, no algorithms with better complexity bounds
were known. In this work, we give fully scalable, faster algorithms for the
connectivity problem, by parameterizing the time complexity as a function of
the diameter of the graph. Our main result is a
time connectivity algorithm for diameter- graphs, using total
memory. If our algorithm can use more memory, it can terminate in fewer rounds,
and there is no lower bound on the memory per processor.
We extend our results to related graph problems such as spanning forest,
finding a DFS sequence, exact/approximate minimum spanning forest, and
bottleneck spanning forest. We also show that achieving similar bounds for
reachability in directed graphs would imply faster boolean matrix
multiplication algorithms.
We introduce several new algorithmic ideas. We describe a general technique
called double exponential speed problem size reduction which roughly means that
if we can use total memory to reduce a problem from size to , for
in one phase, then we can solve the problem in
phases. In order to achieve this fast reduction for graph
connectivity, we use a multistep algorithm. One key step is a carefully
constructed truncated broadcasting scheme where each node broadcasts neighbor
sets to its neighbors in a way that limits the size of the resulting neighbor
sets. Another key step is random leader contraction, where we choose a smaller
set of leaders than many previous works do
- …