2,173 research outputs found
Accelerating Parallel Stochastic Gradient Descent via Non-blocking Mini-batches
SOTA decentralized SGD algorithms can overcome the bandwidth bottleneck at
the parameter server by using communication collectives like Ring All-Reduce
for synchronization. While the parameter updates in distributed SGD may happen
asynchronously there is still a synchronization barrier to make sure that the
local training epoch at every learner is complete before the learners can
advance to the next epoch. The delays in waiting for the slowest
learners(stragglers) remain to be a problem in the synchronization steps of
these state-of-the-art decentralized frameworks. In this paper, we propose the
(de)centralized Non-blocking SGD (Non-blocking SGD) which can address the
straggler problem in a heterogeneous environment. The main idea of Non-blocking
SGD is to split the original batch into mini-batches, then accumulate the
gradients and update the model based on finished mini-batches. The Non-blocking
idea can be implemented using decentralized algorithms including Ring
All-reduce, D-PSGD, and MATCHA to solve the straggler problem. Moreover, using
gradient accumulation to update the model also guarantees convergence and
avoids gradient staleness. Run-time analysis with random straggler delays and
computational efficiency/throughput of devices is also presented to show the
advantage of Non-blocking SGD. Experiments on a suite of datasets and deep
learning networks validate the theoretical analyses and demonstrate that
Non-blocking SGD speeds up the training and fastens the convergence. Compared
with the state-of-the-art decentralized asynchronous algorithms like D-PSGD and
MACHA, Non-blocking SGD takes up to 2x fewer time to reach the same training
loss in a heterogeneous environment.Comment: 12 pages, 4 figure
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
High-Probability Risk Bounds via Sequential Predictors
Online learning methods yield sequential regret bounds under minimal
assumptions and provide in-expectation risk bounds for statistical learning.
However, despite the apparent advantage of online guarantees over their
statistical counterparts, recent findings indicate that in many important
cases, regret bounds may not guarantee tight high-probability risk bounds in
the statistical setting. In this work we show that online to batch conversions
applied to general online learning algorithms can bypass this limitation. Via a
general second-order correction to the loss function defining the regret, we
obtain nearly optimal high-probability risk bounds for several classical
statistical estimation problems, such as discrete distribution estimation,
linear regression, logistic regression, and conditional density estimation. Our
analysis relies on the fact that many online learning algorithms are improper,
as they are not restricted to use predictors from a given reference class. The
improper nature of our estimators enables significant improvements in the
dependencies on various problem parameters. Finally, we discuss some
computational advantages of our sequential algorithms over their existing batch
counterparts.Comment: 24 page
Bounded Rationality and Learning in Complex Markets
This chapter reviews some work on bounded rationality, expectation formation and learning in complex markets, using the familiar demand-supply cobweb model. We emphasize two stories of bounded rationality, one story of adaptive learning and another story of evolutionary selection. According to the adaptive learning story agents are identical, and can be represented by an ``average agent'', who adapts his behavior trying to learn an optimal rule within a class of simple (e.g. linear) rules. The second story is concerned with heterogeneous, interacting agents and evolutionary selection of different forecasting rules. Agents can choose between costly sophisticated forecasting strategies, such as rational expectations, and freely available simple strategies, such as naive expectations, based upon their past performance. We also confront both stories to laboratory experiments on expectation formation. At the end of the chapter, we integrate both stories and consider an economy with evolutionary selection between a costly sophisticated adaptive learning rule and a cheap simple forecasting rule such as naive expectations.
- …