47 research outputs found
The Exploration-Exploitation Trade-Off in Sequential Decision Making Problems
Sequential decision making problems require an agent to repeatedly choose between
a series of actions. Common to such problems is the exploration-exploitation
trade-off, where an agent must choose between the action expected to yield the best
reward (exploitation) or trying an alternative action for potential future benefit (exploration).
The main focus of this thesis is to understand in more detail the role this
trade-off plays in various important sequential decision making problems, in terms
of maximising finite-time reward.
The most common and best studied abstraction of the exploration-exploitation
trade-off is the classic multi-armed bandit problem. In this thesis we study several
important extensions that are more suitable than the classic problem to real-world
applications. These extensions include scenarios where the rewards for actions
change over time or the presence of other agents must be repeatedly considered. In
these contexts, the exploration-exploitation trade-off has a more complicated role
in terms of maximising finite-time performance. For example, the amount of exploration
required will constantly change in a dynamic decision problem, in multiagent
problems agents can explore by communication, and in repeated games, the
exploration-exploitation trade-off must be jointly considered with game theoretic
reasoning.
Existing techniques for balancing exploration-exploitation are focused on achieving
desirable asymptotic behaviour and are in general only applicable to basic decision
problems. The most flexible state-of-the-art approaches, Î-greedy and Î-first,
require exploration parameters to be set a priori, the optimal values of which are
highly dependent on the problem faced. To overcome this, we construct a novel algorithm, Î-ADAPT, which has no exploration parameters and can adapt exploration
on-line for a wide range of problems. Î-ADAPT is built on newly proven theoretical
properties of the Î-first policy and we demonstrate that Î-ADAPT can accurately
learn not only how much to explore, but also when and which actions to explore
The exploration-exploitation trade-off in sequential decision making problems
Sequential decision making problems require an agent to repeatedly choose between a series of actions. Common to such problems is the exploration-exploitation trade-off, where an agent must choose between the action expected to yield the best reward (exploitation) or trying an alternative action for potential future benefit (exploration). The main focus of this thesis is to understand in more detail the role this trade-off plays in various important sequential decision making problems, in terms of maximising finite-time reward. The most common and best studied abstraction of the exploration-exploitation trade-off is the classic multi-armed bandit problem. In this thesis we study several important extensions that are more suitable than the classic problem to real-world applications. These extensions include scenarios where the rewards for actions change over time or the presence of other agents must be repeatedly considered. In these contexts, the exploration-exploitation trade-off has a more complicated role in terms of maximising finite-time performance. For example, the amount of exploration required will constantly change in a dynamic decision problem, in multi-agent problems agents can explore by communication, and in repeated games, the exploration-exploitation trade-off must be jointly considered with game theoretic reasoning. Existing techniques for balancing exploration-exploitation are focused on achieving desirable asymptotic behaviour and are in general only applicable to basic decision problems. The most flexible state-of-the-art approaches, ε-greedy and ε-first, require exploration parameters to be set a priori, the optimal values of which are highly dependent on the problem faced. To overcome this, we construct a novel algorithm, ε-ADAPT, which has no exploration parameters and can adapt exploration on-line for a wide range of problems. ε-ADAPT is built on newly proven theoretical properties of the ε-first policy and we demonstrate that ε-ADAPT can accurately learn not only how much to explore, but also when and which actions to explore
Frequency-Domain Stochastic Modeling of Stationary Bivariate or Complex-Valued Signals
There are three equivalent ways of representing two jointly observed
real-valued signals: as a bivariate vector signal, as a single complex-valued
signal, or as two analytic signals known as the rotary components. Each
representation has unique advantages depending on the system of interest and
the application goals. In this paper we provide a joint framework for all three
representations in the context of frequency-domain stochastic modeling. This
framework allows us to extend many established statistical procedures for
bivariate vector time series to complex-valued and rotary representations.
These include procedures for parametrically modeling signal coherence,
estimating model parameters using the Whittle likelihood, performing
semi-parametric modeling, and choosing between classes of nested models using
model choice. We also provide a new method of testing for impropriety in
complex-valued signals, which tests for noncircular or anisotropic second-order
statistical structure when the signal is represented in the complex plane.
Finally, we demonstrate the usefulness of our methodology in capturing the
anisotropic structure of signals observed from fluid dynamic simulations of
turbulence.Comment: To appear in IEEE Transactions on Signal Processin
A Power Variance Test for Nonstationarity in Complex-Valued Signals
We propose a novel algorithm for testing the hypothesis of nonstationarity in
complex-valued signals. The implementation uses both the bootstrap and the Fast
Fourier Transform such that the algorithm can be efficiently implemented in
O(NlogN) time, where N is the length of the observed signal. The test procedure
examines the second-order structure and contrasts the observed power variance -
i.e. the variability of the instantaneous variance over time - with the
expected characteristics of stationary signals generated via the bootstrap
method. Our algorithmic procedure is capable of learning different types of
nonstationarity, such as jumps or strong sinusoidal components. We illustrate
the utility of our test and algorithm through application to turbulent flow
data from fluid dynamics
Identifying and Responding to Outlier Demand in Revenue Management
Revenue management strongly relies on accurate forecasts. Thus, when extraordinary events cause outlier demand, revenue management systems need to recognise this and adapt both forecast and controls. Many passenger transport service providers, such as railways and airlines, control the sale of tickets through revenue management. State-of-the-art systems in these industries rely on analyst expertise to identify outlier demand both online (within the booking horizon) and offline (in hindsight). So far, little research focuses on automating and evaluating the detection of outlier demand in this context. To remedy this, we propose a novel approach, which detects outliers using functional data analysis in combination with time series extrapolation. We evaluate the approach in a simulation framework, which generates outliers by varying the demand model. The results show that functional outlier detection yields better detection rates than alternative approaches for both online and offline analyses. Depending on the category of outliers, extrapolation further increases online detection performance. We also apply the procedure to a set of empirical data to demonstrate its practical implications. By evaluating the full feedback-driven system of forecast and optimisation, we generate insight on the asymmetric effects of positive and negative demand outliers. We show that identifying instances of outlier demand and adjusting the forecast in a timely fashion substantially increases revenue compared to what is earned when ignoring outliers
Estimating the parameters of ocean wave spectra
Wind-generated waves are often treated as stochastic processes. There is particular interest in their spectral density functions, which are often expressed in some parametric form. Such spectral density functions are used as inputs when modelling structural response or other engineering concerns. Therefore, accurate and precise recovery of the parameters of such a form, from observed wave records, is important. Current techniques are known to struggle with recovering certain parameters, especially the peak enhancement factor and spectral tail decay. We introduce an approach from the statistical literature, known as the de-biased Whittle likelihood, and address some practical concerns regarding its implementation in the context of wind-generated waves. We demonstrate, through numerical simulation, that the de-biased Whittle likelihood outperforms current techniques, such as least squares fitting, both in terms of accuracy and precision of the recovered parameters. We also provide a method for estimating the uncertainty of parameter estimates. We perform an example analysis on a data-set recorded off the coast of New Zealand, to illustrate some of the extra practical concerns that arise when estimating the parameters of spectra from observed data
Separating Mesoscale and Submesoscale Flows from Clustered Drifter Trajectories
Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well as their associated statistical uncertainty. We show that in practice the mesoscale component of our model can explain much first and second-moment variability in drifter velocities, especially at low frequencies. This results in much lower and more meaningful measures of submesoscale diffusivity, which would otherwise be contaminated by unresolved mesoscale flow. We quantify these effects theoretically via computing Lagrangian frequency spectra, and demonstrate the usefulness of our methodology through simulations as well as with real observations from the LatMix deployment of drifters. The outcome of this method is a full Lagrangian decomposition of each drifter trajectory into three components that represent the background, mesoscale, and submesoscale flow