1,190 research outputs found
Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data
Stochastic variance reduced methods have gained a lot of interest recently
for empirical risk minimization due to its appealing run time complexity. When
the data size is large and disjointly stored on different machines, it becomes
imperative to distribute the implementation of such variance reduced methods.
In this paper, we consider a general framework that directly distributes
popular stochastic variance reduced methods in the master/slave model, by
assigning outer loops to the parameter server, and inner loops to worker
machines. This framework is natural and friendly to implement, but its
theoretical convergence is not well understood. We obtain a comprehensive
understanding of algorithmic convergence with respect to data homogeneity by
measuring the smoothness of the discrepancy between the local and global loss
functions. We establish the linear convergence of distributed versions of a
family of stochastic variance reduced algorithms, including those using
accelerated and recursive gradient updates, for minimizing strongly convex
losses. Our theory captures how the convergence of distributed algorithms
behaves as the number of machines and the size of local data vary. Furthermore,
we show that when the data are less balanced, regularization can be used to
ensure convergence at a slower rate. We also demonstrate that our analysis can
be further extended to handle nonconvex loss functions
Model-Free Reinforcement Learning for Financial Portfolios: A Brief Survey
Financial portfolio management is one of the problems that are most
frequently encountered in the investment industry. Nevertheless, it is not
widely recognized that both Kelly Criterion and Risk Parity collapse into Mean
Variance under some conditions, which implies that a universal solution to the
portfolio optimization problem could potentially exist. In fact, the process of
sequential computation of optimal component weights that maximize the
portfolio's expected return subject to a certain risk budget can be
reformulated as a discrete-time Markov Decision Process (MDP) and hence as a
stochastic optimal control, where the system being controlled is a portfolio
consisting of multiple investment components, and the control is its component
weights. Consequently, the problem could be solved using model-free
Reinforcement Learning (RL) without knowing specific component dynamics. By
examining existing methods of both value-based and policy-based model-free RL
for the portfolio optimization problem, we identify some of the key unresolved
questions and difficulties facing today's portfolio managers of applying
model-free RL to their investment portfolios
SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms
SARAH and SPIDER are two recently developed stochastic variance-reduced
algorithms, and SPIDER has been shown to achieve a near-optimal first-order
oracle complexity in smooth nonconvex optimization. However, SPIDER uses an
accuracy-dependent stepsize that slows down the convergence in practice, and
cannot handle objective functions that involve nonsmooth regularizers. In this
paper, we propose SpiderBoost as an improved scheme, which allows to use a much
larger constant-level stepsize while maintaining the same near-optimal oracle
complexity, and can be extended with proximal mapping to handle composite
optimization (which is nonsmooth and nonconvex) with provable convergence
guarantee. In particular, we show that proximal SpiderBoost achieves an oracle
complexity of in
composite nonconvex optimization, improving the state-of-the-art result by a
factor of . We further develop a
novel momentum scheme to accelerate SpiderBoost for composite optimization,
which achieves the near-optimal oracle complexity in theory and substantial
improvement in experiments.Comment: Appear in NeurIPS 201
Algorithmic Bio-surveillance For Precise Spatio-temporal Prediction of Zoonotic Emergence
Viral zoonoses have emerged as the key drivers of recent pandemics. Human
infection by zoonotic viruses are either spillover events -- isolated
infections that fail to cause a widespread contagion -- or species jumps, where
successful adaptation to the new host leads to a pandemic. Despite expensive
bio-surveillance efforts, historically emergence response has been reactive,
and post-hoc. Here we use machine inference to demonstrate a high accuracy
predictive bio-surveillance capability, designed to pro-actively localize an
impending species jump via automated interrogation of massive sequence
databases of viral proteins. Our results suggest that a jump might not purely
be the result of an isolated unfortunate cross-infection localized in space and
time; there are subtle yet detectable patterns of genotypic changes
accumulating in the global viral population leading up to emergence. Using tens
of thousands of protein sequences simultaneously, we train models that track
maximum achievable accuracy for disambiguating host tropism from the primary
structure of surface proteins, and show that the inverse classification
accuracy is a quantitative indicator of jump risk. We validate our claim in the
context of the 2009 swine flu outbreak, and the 2004 emergence of H5N1
subspecies of Influenza A from avian reservoirs; illustrating that
interrogation of the global viral population can unambiguously track a near
monotonic risk elevation over several preceding years leading to eventual
emergence.Comment: 8 pages, 5 figure
Accelerating Gradient Boosting Machine
Gradient Boosting Machine (GBM) is an extremely powerful supervised learning
algorithm that is widely used in practice. GBM routinely features as a leading
algorithm in machine learning competitions such as Kaggle and the KDDCup. In
this work, we propose Accelerated Gradient Boosting Machine (AGBM) by
incorporating Nesterov's acceleration techniques into the design of GBM. The
difficulty in accelerating GBM lies in the fact that weak (inexact) learners
are commonly used, and therefore the errors can accumulate in the momentum
term. To overcome it, we design a "corrected pseudo residual" and fit best weak
learner to this corrected pseudo residual, in order to perform the z-update.
Thus, we are able to derive novel computational guarantees for AGBM. This is
the first GBM type of algorithm with theoretically-justified accelerated
convergence rate. Finally we demonstrate with a number of numerical experiments
the effectiveness of AGBM over conventional GBM in obtaining a model with good
training and/or testing data fidelity
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH),
as well as its practical variant SARAH+, as a novel approach to the finite-sum
minimization problems. Different from the vanilla SGD and other modern
stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple
recursive framework for updating stochastic gradient estimates; when comparing
to SAG/SAGA, SARAH does not require a storage of past gradients. The linear
convergence rate of SARAH is proven under strong convexity assumption. We also
prove a linear convergence rate (in the strongly convex case) for an inner loop
of SARAH, the property that SVRG does not possess. Numerical experiments
demonstrate the efficiency of our algorithm
From single attitudes to belief systems: Examining the centrality of STEM attitudes using belief network analysis
Many achievement and motivation theories claim that a specific set of beliefs, interests or values plays a central role in determining career choice and behavior. In order to investigate how attitudes determine behaviors, researchers generally investigate each attitude in isolation. This article argues that studying belief systems rather than single attitudes has several explanatory advantages. In particular, a system-level approach can provide clear definitions and measures of attitude importance. Using a nationally representative sample of 13,283 9th graders and measures of 136 STEM-related attitudes, I implement a belief network analysis to investigate which attitudes are most influential in determining STEM career choice. The results suggest that identity beliefs, educational expectations and ability-related beliefs play central roles in individuals’ belief systems
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients
Modern statistical inference tasks often require iterative optimization
methods to compute the solution. Convergence analysis from an optimization
viewpoint only informs us how well the solution is approximated numerically but
overlooks the sampling nature of the data. In contrast, recognizing the
randomness in the data, statisticians are keen to provide uncertainty
quantification, or confidence, for the solution obtained using iterative
optimization methods. This paper makes progress along this direction by
introducing the moment-adjusted stochastic gradient descents, a new stochastic
optimization method for statistical inference. We establish non-asymptotic
theory that characterizes the statistical distribution for certain iterative
methods with optimization guarantees. On the statistical front, the theory
allows for model mis-specification, with very mild conditions on the data. For
optimization, the theory is flexible for both convex and non-convex cases.
Remarkably, the moment-adjusting idea motivated from "error standardization" in
statistics achieves a similar effect as acceleration in first-order
optimization methods used to fit generalized linear models. We also demonstrate
this acceleration effect in the non-convex setting through numerical
experiments.Comment: Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 2019, to appea
A Unified Framework for Stochastic Matrix Factorization via Variance Reduction
We propose a unified framework to speed up the existing stochastic matrix
factorization (SMF) algorithms via variance reduction. Our framework is general
and it subsumes several well-known SMF formulations in the literature. We
perform a non-asymptotic convergence analysis of our framework and derive
computational and sample complexities for our algorithm to converge to an
-stationary point in expectation. In addition, extensive experiments
for a wide class of SMF formulations demonstrate that our framework
consistently yields faster convergence and a more accurate output dictionary
vis-\`a-vis state-of-the-art frameworks
- …