3 research outputs found

    Adversarial Thresholding Semi-Bandits

    Get PDF
    The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore). We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high-energy physics experiments, on the Large Hadron Collider. We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T-Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T-Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, compared with exponentially weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments. In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped, to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.N/

    Adaptive Sampling for Seabed Identification from Ambient Acoustic Noise

    Get PDF
    This thesis discusses distinguishing geoacoustic properties of seabeds by adaptively sampling ambient acoustic measurements using an autonomous underwater vehicle. Modern advancements in seabed characterization motivate an online method for distinguishing between two seabed types using Fourier transformed ambient acoustic noise snapshots. These snapshots are assumed to be associated with their spatial location and seabed, making the goal to organize unlabeled seabed locations. The snapshots are transformed to obtain pairwise similarities between locations. Locations with similarities exceeding a threshold are classified together with the goal of identifying all locations with similarities below this threshold using a process known as level set estimation. We propose an adaptive sampling policy that aims to directly reduce the number of locations with unknown level set membership via a lookahead step in addition to minimizing the distance traveled through a nearest neighbors approach. Results on synthetic, single-boundary fields and multi-boundary realistic world data are evaluated by comparing predicted and true level set assignments and demonstrate the benefits of the proposed algorithm. Furthermore, extensions discussing the effects of a two-step and path planning lookahead in addition to the benefits of bathymetry data are included to motivate further research
    corecore