11 research outputs found

    Adaptive Regret Minimization in Bounded-Memory Games

    Get PDF
    Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, a hypothetical k-adaptive adversary adapts her strategy to the defender's actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies. We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP=RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time n^{O(1)}.Comment: Full Version. GameSec 2013 (Invited Paper

    Chasing Ghosts: Competing with Stateful Policies

    Full text link
    We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting). If either the reference policies are stateless rather than stateful, or the feedback includes the rewards of all actions (the so called "expert" setting), previous work shows that the optimal regret grows like Θ(T)\Theta(\sqrt{T}) in terms of the number of decision rounds TT. The difficulty in our setting is that the decision maker unavoidably loses track of the internal states of the reference policies, and thus cannot reliably attribute rewards observed in a certain round to any of the reference policies. In fact, in this setting it is impossible for the algorithm to estimate which policy gives the highest (or even approximately highest) total reward. Nevertheless, we design an algorithm that achieves expected regret that is sublinear in TT, of the form O(T/log1/4T)O( T/\log^{1/4}{T}). Our algorithm is based on a certain local repetition lemma that may be of independent interest. We also show that no algorithm can guarantee expected regret better than O(T/log3/2T)O( T/\log^{3/2} T)

    Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality

    Get PDF
    This brief paper provides a simple algorithm that selects a strategy at each time in a given set of multiple strategies for stochastic multiarmed bandit problems, thereby playing the arm by the chosen strategy at each time. The algorithm follows the idea of the probabilistic ϵt-switching in the ϵt-greedy strategy and is asymptotically optimal in the sense that the selected strategy converges to the best in the set under some conditions on the strategies in the set and the sequence of {ϵt}

    Location Privacy Protection in the Mobile Era and Beyond

    Full text link
    As interconnected devices become embedded in every aspect of our lives, they accompany many privacy risks. Location privacy is one notable case, consistently recording an individual’s location might lead to his/her tracking, fingerprinting and profiling. An individual’s location privacy can be compromised when tracked by smartphone apps, in indoor spaces, and/or through Internet of Things (IoT) devices. Recent surveys have indicated that users genuinely value their location privacy and would like to exercise control over who collects and processes their location data. They, however, lack the effective and practical tools to protect their location privacy. An effective location privacy protection mechanism requires real understanding of the underlying threats, and a practical one requires as little changes to the existing ecosystems as possible while ensuring psychological acceptability to the users. This thesis addresses this problem by proposing a suite of effective and practical privacy preserving mechanisms that address different aspects of real-world location privacy threats. First, we present LP-Guardian, a comprehensive framework for location privacy protection for Android smartphone users. LP-Guardian overcomes the shortcomings of existing approaches by addressing the tracking, profiling, and fingerprinting threats posed by different mobile apps while maintaining their functionality. LP-Guardian requires modifying the underlying platform of the mobile operating system, but no changes in either the apps or service provider. We then propose LP-Doctor, a light-weight user-level tool which allows Android users to effectively utilize the OS’s location access controls. As opposed to LP-Guardian, LP-Doctor requires no platform changes. It builds on a two year data collection campaign in which we analyzed the location privacy threats posed by 1160 apps for 100 users. For the case of indoor location tracking, we present PR-LBS (Privacy vs. Reward for Location-Based Service), a system that balances the users’ privacy concerns and the benefits of sharing location data in indoor location tracking environments. PR-LBS fits within the existing indoor localization ecosystem whether it is infrastructure-based or device-based. Finally, we target the privacy threats originating from the IoT devices that employ the emerging Bluetooth Low Energy (BLE) protocol through BLE-Guardian. BLE-Guardian is a device agnostic system that prevents user tracking and profiling while securing access to his/her BLE-powered devices. We evaluate BLE-Guardian in real-world scenarios and demonstrate its effectiveness in protecting the user along with its low overhead on the user’s devices.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138563/1/kmfawaz_1.pd
    corecore