739 research outputs found
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for
POMDPs. It is inefficient due to the need to account for the entire belief
space, which necessitates the solution of large numbers of linear programs. In
this paper, we study value iteration restricted to belief subsets. We show
that, together with properly chosen belief subsets, restricted value iteration
yields near-optimal policies and we give a condition for determining whether a
given belief subset would bring about savings in space and time. We also apply
restricted value iteration to two interesting classes of POMDPs, namely
informative POMDPs and near-discernible POMDPs
Sequential Selection of Correlated Ads by POMDPs
Online advertising has become a key source of revenue for both web search
engines and online publishers. For them, the ability of allocating right ads to
right webpages is critical because any mismatched ads would not only harm web
users' satisfactions but also lower the ad income. In this paper, we study how
online publishers could optimally select ads to maximize their ad incomes over
time. The conventional offline, content-based matching between webpages and ads
is a fine start but cannot solve the problem completely because good matching
does not necessarily lead to good payoff. Moreover, with the limited display
impressions, we need to balance the need of selecting ads to learn true ad
payoffs (exploration) with that of allocating ads to generate high immediate
payoffs based on the current belief (exploitation). In this paper, we address
the problem by employing Partially observable Markov decision processes
(POMDPs) and discuss how to utilize the correlation of ads to improve the
efficiency of the exploration and increase ad incomes in a long run. Our
mathematical derivation shows that the belief states of correlated ads can be
naturally updated using a formula similar to collaborative filtering. To test
our model, a real world ad dataset from a major search engine is collected and
categorized. Experimenting over the data, we provide an analyse of the effect
of the underlying parameters, and demonstrate that our algorithms significantly
outperform other strong baselines
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches
Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment
- …