47 research outputs found

    The Exploration-Exploitation Trade-Off in Sequential Decision Making Problems

    No full text
    Sequential decision making problems require an agent to repeatedly choose between a series of actions. Common to such problems is the exploration-exploitation trade-off, where an agent must choose between the action expected to yield the best reward (exploitation) or trying an alternative action for potential future benefit (exploration). The main focus of this thesis is to understand in more detail the role this trade-off plays in various important sequential decision making problems, in terms of maximising finite-time reward. The most common and best studied abstraction of the exploration-exploitation trade-off is the classic multi-armed bandit problem. In this thesis we study several important extensions that are more suitable than the classic problem to real-world applications. These extensions include scenarios where the rewards for actions change over time or the presence of other agents must be repeatedly considered. In these contexts, the exploration-exploitation trade-off has a more complicated role in terms of maximising finite-time performance. For example, the amount of exploration required will constantly change in a dynamic decision problem, in multiagent problems agents can explore by communication, and in repeated games, the exploration-exploitation trade-off must be jointly considered with game theoretic reasoning. Existing techniques for balancing exploration-exploitation are focused on achieving desirable asymptotic behaviour and are in general only applicable to basic decision problems. The most flexible state-of-the-art approaches, έ-greedy and έ-first, require exploration parameters to be set a priori, the optimal values of which are highly dependent on the problem faced. To overcome this, we construct a novel algorithm, έ-ADAPT, which has no exploration parameters and can adapt exploration on-line for a wide range of problems. έ-ADAPT is built on newly proven theoretical properties of the έ-first policy and we demonstrate that έ-ADAPT can accurately learn not only how much to explore, but also when and which actions to explore

    The exploration-exploitation trade-off in sequential decision making problems

    Get PDF
    Sequential decision making problems require an agent to repeatedly choose between a series of actions. Common to such problems is the exploration-exploitation trade-off, where an agent must choose between the action expected to yield the best reward (exploitation) or trying an alternative action for potential future benefit (exploration). The main focus of this thesis is to understand in more detail the role this trade-off plays in various important sequential decision making problems, in terms of maximising finite-time reward. The most common and best studied abstraction of the exploration-exploitation trade-off is the classic multi-armed bandit problem. In this thesis we study several important extensions that are more suitable than the classic problem to real-world applications. These extensions include scenarios where the rewards for actions change over time or the presence of other agents must be repeatedly considered. In these contexts, the exploration-exploitation trade-off has a more complicated role in terms of maximising finite-time performance. For example, the amount of exploration required will constantly change in a dynamic decision problem, in multi-agent problems agents can explore by communication, and in repeated games, the exploration-exploitation trade-off must be jointly considered with game theoretic reasoning. Existing techniques for balancing exploration-exploitation are focused on achieving desirable asymptotic behaviour and are in general only applicable to basic decision problems. The most flexible state-of-the-art approaches, ε-greedy and ε-first, require exploration parameters to be set a priori, the optimal values of which are highly dependent on the problem faced. To overcome this, we construct a novel algorithm, ε-ADAPT, which has no exploration parameters and can adapt exploration on-line for a wide range of problems. ε-ADAPT is built on newly proven theoretical properties of the ε-first policy and we demonstrate that ε-ADAPT can accurately learn not only how much to explore, but also when and which actions to explore

    Frequency-Domain Stochastic Modeling of Stationary Bivariate or Complex-Valued Signals

    Get PDF
    There are three equivalent ways of representing two jointly observed real-valued signals: as a bivariate vector signal, as a single complex-valued signal, or as two analytic signals known as the rotary components. Each representation has unique advantages depending on the system of interest and the application goals. In this paper we provide a joint framework for all three representations in the context of frequency-domain stochastic modeling. This framework allows us to extend many established statistical procedures for bivariate vector time series to complex-valued and rotary representations. These include procedures for parametrically modeling signal coherence, estimating model parameters using the Whittle likelihood, performing semi-parametric modeling, and choosing between classes of nested models using model choice. We also provide a new method of testing for impropriety in complex-valued signals, which tests for noncircular or anisotropic second-order statistical structure when the signal is represented in the complex plane. Finally, we demonstrate the usefulness of our methodology in capturing the anisotropic structure of signals observed from fluid dynamic simulations of turbulence.Comment: To appear in IEEE Transactions on Signal Processin

    A Power Variance Test for Nonstationarity in Complex-Valued Signals

    Full text link
    We propose a novel algorithm for testing the hypothesis of nonstationarity in complex-valued signals. The implementation uses both the bootstrap and the Fast Fourier Transform such that the algorithm can be efficiently implemented in O(NlogN) time, where N is the length of the observed signal. The test procedure examines the second-order structure and contrasts the observed power variance - i.e. the variability of the instantaneous variance over time - with the expected characteristics of stationary signals generated via the bootstrap method. Our algorithmic procedure is capable of learning different types of nonstationarity, such as jumps or strong sinusoidal components. We illustrate the utility of our test and algorithm through application to turbulent flow data from fluid dynamics

    Identifying and Responding to Outlier Demand in Revenue Management

    Get PDF
    Revenue management strongly relies on accurate forecasts. Thus, when extraordinary events cause outlier demand, revenue management systems need to recognise this and adapt both forecast and controls. Many passenger transport service providers, such as railways and airlines, control the sale of tickets through revenue management. State-of-the-art systems in these industries rely on analyst expertise to identify outlier demand both online (within the booking horizon) and offline (in hindsight). So far, little research focuses on automating and evaluating the detection of outlier demand in this context. To remedy this, we propose a novel approach, which detects outliers using functional data analysis in combination with time series extrapolation. We evaluate the approach in a simulation framework, which generates outliers by varying the demand model. The results show that functional outlier detection yields better detection rates than alternative approaches for both online and offline analyses. Depending on the category of outliers, extrapolation further increases online detection performance. We also apply the procedure to a set of empirical data to demonstrate its practical implications. By evaluating the full feedback-driven system of forecast and optimisation, we generate insight on the asymmetric effects of positive and negative demand outliers. We show that identifying instances of outlier demand and adjusting the forecast in a timely fashion substantially increases revenue compared to what is earned when ignoring outliers

    Estimating the parameters of ocean wave spectra

    Get PDF
    Wind-generated waves are often treated as stochastic processes. There is particular interest in their spectral density functions, which are often expressed in some parametric form. Such spectral density functions are used as inputs when modelling structural response or other engineering concerns. Therefore, accurate and precise recovery of the parameters of such a form, from observed wave records, is important. Current techniques are known to struggle with recovering certain parameters, especially the peak enhancement factor and spectral tail decay. We introduce an approach from the statistical literature, known as the de-biased Whittle likelihood, and address some practical concerns regarding its implementation in the context of wind-generated waves. We demonstrate, through numerical simulation, that the de-biased Whittle likelihood outperforms current techniques, such as least squares fitting, both in terms of accuracy and precision of the recovered parameters. We also provide a method for estimating the uncertainty of parameter estimates. We perform an example analysis on a data-set recorded off the coast of New Zealand, to illustrate some of the extra practical concerns that arise when estimating the parameters of spectra from observed data

    Separating Mesoscale and Submesoscale Flows from Clustered Drifter Trajectories

    Get PDF
    Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well as their associated statistical uncertainty. We show that in practice the mesoscale component of our model can explain much first and second-moment variability in drifter velocities, especially at low frequencies. This results in much lower and more meaningful measures of submesoscale diffusivity, which would otherwise be contaminated by unresolved mesoscale flow. We quantify these effects theoretically via computing Lagrangian frequency spectra, and demonstrate the usefulness of our methodology through simulations as well as with real observations from the LatMix deployment of drifters. The outcome of this method is a full Lagrangian decomposition of each drifter trajectory into three components that represent the background, mesoscale, and submesoscale flow
    corecore