379 research outputs found

    A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

    Get PDF
    We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications

    Generalized asset integrity games

    Get PDF
    Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries. This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score. Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning
    corecore