1,338 research outputs found
Functional Bandits
We introduce the functional bandit problem, where the objective is to find an
arm that optimises a known functional of the unknown arm-reward distributions.
These problems arise in many settings such as maximum entropy methods in
natural language processing, and risk-averse decision-making, but current
best-arm identification techniques fail in these domains. We propose a new
approach, that combines functional estimation and arm elimination, to tackle
this problem. This method achieves provably efficient performance guarantees.
In addition, we illustrate this method on a number of important functionals in
risk management and information theory, and refine our generic theoretical
results in those cases
Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Resource allocation games such as the famous Colonel Blotto (CB) and
Hide-and-Seek (HS) games are often used to model a large variety of practical
problems, but only in their one-shot versions. Indeed, due to their extremely
large strategy space, it remains an open question how one can efficiently learn
in these games. In this work, we show that the online CB and HS games can be
cast as path planning problems with side-observations (SOPPP): at each stage, a
learner chooses a path on a directed acyclic graph and suffers the sum of
losses that are adversarially assigned to the corresponding edges; and she then
receives semi-bandit feedback with side-observations (i.e., she observes the
losses on the chosen edges plus some others). We propose a novel algorithm,
EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP
without requiring any auxiliary oracle. We provide an expected-regret bound of
EXP3-OE in SOPPP matching the order of the best benchmark in the literature.
Moreover, we introduce additional assumptions on the observability model under
which we can further improve the regret bounds of EXP3-OE. We illustrate the
benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.Comment: Previously, this work appeared as arXiv:1911.09023 which was
mistakenly submitted as a new article (has been submitted to be withdrawn).
This is a preprint of the work published in Proceedings of the 34th AAAI
Conference on Artificial Intelligence (AAAI
Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits
In budget-limited multi-armed bandit (MAB) problems, the learner's actions
are costly and constrained by a fixed budget. Consequently, an optimal
exploitation policy may not be to pull the optimal arm repeatedly, as is the
case in other variants of MAB, but rather to pull the sequence of different
arms that maximises the agent's total reward within the budget. This difference
from existing MABs means that new approaches to maximising the total reward are
required. Given this, we develop two pulling policies, namely: (i) KUBE; and
(ii) fractional KUBE. Whereas the former provides better performance up to 40%
in our experimental settings, the latter is computationally less expensive. We
also prove logarithmic upper bounds for the regret of both policies, and show
that these bounds are asymptotically optimal (i.e. they only differ from the
best possible regret by a constant factor)
Designing the Game to Play: Optimizing Payoff Structure in Security Games
Effective game-theoretic modeling of defender-attacker behavior is becoming
increasingly important. In many domains, the defender functions not only as a
player but also the designer of the game's payoff structure. We study
Stackelberg Security Games where the defender, in addition to allocating
defensive resources to protect targets from the attacker, can strategically
manipulate the attacker's payoff under budget constraints in weighted L^p-norm
form regarding the amount of change. Focusing on problems with weighted
L^1-norm form constraint, we present (i) a mixed integer linear program-based
algorithm with approximation guarantee; (ii) a branch-and-bound based algorithm
with improved efficiency achieved by effective pruning; (iii) a polynomial time
approximation scheme for a special but practical class of problems. In
addition, we show that problems under budget constraints in L^0-norm form and
weighted L^\infty-norm form can be solved in polynomial time. We provide an
extensive experimental evaluation of our proposed algorithms
Power beacon-assisted energy harvesting in a half-duplex communication network under co-channel interference over a Rayleigh fading environment: Energy efficiency and outage probability analysis
In this time, energy efficiency (EE), measured in bits per Watt, has been considered as an important emerging metric in energy-constrained wireless communication networks because of their energy shortage. In this paper, we investigate power beacon assisted (PB) energy harvesting (EH) in half-duplex (HD) communication network under co-channel Interferer over Rayleigh fading environment. In this work, we investigate the model system with the time switching (TS) protocol. Firstly, the exact and asymptotic form expressions of the outage probability (OP) are analyzed and derived. Then the system EE is investigated and the influence of the primary system parameters on the system performance. Finally, we verify the correctness of the analytical expressions using Monte Carlo simulation. Finally, we can state that the simulation and analytical results are the same.Web of Science1213art. no. 257
An Agent-Based Distributed Coordination Mechanism for Wireless Visual Sensor Nodes Using Dynamic Programming
The efficient management of the limited energy resources of a wireless visual sensor network is central to its successful operation. Within this context, this article focuses on the adaptive sampling, forwarding, and routing actions of each node in order to maximise the information value of the data collected. These actions are inter-related in a multi-hop routing scenario because each node’s energy consumption must be optimally allocated between sampling and transmitting its own data, receiving and forwarding the data of other nodes, and routing any data. Thus, we develop two optimal agent-based decentralised algorithms to solve this distributed constraint optimization problem. The first assumes that the route by which data is forwarded to the base station is fixed, and then calculates the optimal sampling, transmitting, and forwarding actions that each node should perform. The second assumes flexible routing, and makes optimal decisions regarding both the integration of actions that each node should choose, and also the route by which the data should be forwarded to the base station. The two algorithms represent a trade-off in optimality, communication cost, and processing time. In an empirical evaluation on sensor networks (whose underlying communication networks exhibit loops), we show that the algorithm with flexible routing is able to deliver approximately twice the quantity of information to the base station compared to the algorithm using fixed routing (where an arbitrary choice of route is made). However, this gain comes at a considerable communication and computational cost (increasing both by a factor of 100 times). Thus, while the algorithm with flexible routing is suitable for networks with a small numbers of nodes, it scales poorly, and as the size of the network increases, the algorithm with fixed routing is favoured
On the Inducibility of Stackelberg Equilibrium for Security Games
Strong Stackelberg equilibrium (SSE) is the standard solution concept of
Stackelberg security games. As opposed to the weak Stackelberg equilibrium
(WSE), the SSE assumes that the follower breaks ties in favor of the leader and
this is widely acknowledged and justified by the assertion that the defender
can often induce the attacker to choose a preferred action by making an
infinitesimal adjustment to her strategy. Unfortunately, in security games with
resource assignment constraints, the assertion might not be valid; it is
possible that the defender cannot induce the desired outcome. As a result, many
results claimed in the literature may be overly optimistic. To remedy, we first
formally define the utility guarantee of a defender strategy and provide
examples to show that the utility of SSE can be higher than its utility
guarantee. Second, inspired by the analysis of leader's payoff by Von Stengel
and Zamir (2004), we provide the solution concept called the inducible
Stackelberg equilibrium (ISE), which owns the highest utility guarantee and
always exists. Third, we show the conditions when ISE coincides with SSE and
the fact that in general case, SSE can be extremely worse with respect to
utility guarantee. Moreover, introducing the ISE does not invalidate existing
algorithmic results as the problem of computing an ISE polynomially reduces to
that of computing an SSE. We also provide an algorithmic implementation for
computing ISE, with which our experiments unveil the empirical advantage of the
ISE over the SSE.Comment: The Thirty-Third AAAI Conference on Artificial Intelligenc
- …
