166 research outputs found

    A Projective Simulation Scheme for Partially-Observable Multi-Agent Systems

    Full text link
    We introduce a kind of partial observability to the projective simulation (PS) learning method. It is done by adding a belief projection operator and an observability parameter to the original framework of the efficiency of the PS model. I provide theoretical formulations, network representations, and situated scenarios derived from the invasion toy problem as a starting point for some multi-agent PS models.Comment: 28 pages, 21 figure

    Quantum partially observable Markov decision processes

    Get PDF
    We present quantum observable Markov decision processes (QOMDPs), the quantum analogs of partially observable Markov decision processes (POMDPs). In a QOMDP, an agent is acting in a world where the state is represented as a quantum state and the agent can choose a superoperator to apply. This is similar to the POMDP belief state, which is a probability distribution over world states and evolves via a stochastic matrix. We show that the existence of a policy of at least a certain value has the same complexity for QOMDPs and POMDPs in the polynomial and infinite horizon cases. However, we also prove that the existence of a policy that can reach a goal state is decidable for goal POMDPs and undecidable for goal QOMDPs.National Science Foundation (U.S.) (Grant 0844626)National Science Foundation (U.S.) (Grant 1122374)National Science Foundation (U.S.) (Waterman Award

    Dynamics of Social Networks: Multi-agent Information Fusion, Anticipatory Decision Making and Polling

    Full text link
    This paper surveys mathematical models, structural results and algorithms in controlled sensing with social learning in social networks. Part 1, namely Bayesian Social Learning with Controlled Sensing addresses the following questions: How does risk averse behavior in social learning affect quickest change detection? How can information fusion be priced? How is the convergence rate of state estimation affected by social learning? The aim is to develop and extend structural results in stochastic control and Bayesian estimation to answer these questions. Such structural results yield fundamental bounds on the optimal performance, give insight into what parameters affect the optimal policies, and yield computationally efficient algorithms. Part 2, namely, Multi-agent Information Fusion with Behavioral Economics Constraints generalizes Part 1. The agents exhibit sophisticated decision making in a behavioral economics sense; namely the agents make anticipatory decisions (thus the decision strategies are time inconsistent and interpreted as subgame Bayesian Nash equilibria). Part 3, namely {\em Interactive Sensing in Large Networks}, addresses the following questions: How to track the degree distribution of an infinite random graph with dynamics (via a stochastic approximation on a Hilbert space)? How can the infected degree distribution of a Markov modulated power law network and its mean field dynamics be tracked via Bayesian filtering given incomplete information obtained by sampling the network? We also briefly discuss how the glass ceiling effect emerges in social networks. Part 4, namely \emph{Efficient Network Polling} deals with polling in large scale social networks. In such networks, only a fraction of nodes can be polled to determine their decisions. Which nodes should be polled to achieve a statistically accurate estimates

    Improving quantum state detection with adaptive sequential observations

    Full text link
    For many quantum systems intended for information processing, one detects the logical state of a qubit by integrating a continuously observed quantity over time. For example, ion and atom qubits are typically measured by driving a cycling transition and counting the number of photons observed from the resulting fluorescence. Instead of recording only the total observed count in a fixed time interval, one can observe the photon arrival times and get a state detection advantage by using the temporal structure in a model such as a Hidden Markov Model. We study what further advantage may be achieved by applying pulses to adaptively transform the state during the observation. We give a three-state example where adaptively chosen transformations yield a clear advantage, and we compare performances on an ion example, where we see improvements in some regimes. We provide a software package that can be used for exploration of temporally resolved strategies with and without adaptively chosen transformations.Comment: Submitted for publication in Quantum Science and Technology. 26 pages, 8 figures. Corrected typos in appendix, updated acknowledgement

    Contributions on complexity bounds for Deterministic Partially Observed Markov Decision Process

    Full text link
    Markov Decision Processes (Mdps) form a versatile framework used to model a wide range of optimization problems. The Mdp model consists of sets of states, actions, time steps, rewards, and probability transitions. When in a given state and at a given time, the decision maker's action generates a reward and determines the state at the next time step according to the probability transition function. However, Mdps assume that the decision maker knows the state of the controlled dynamical system. Hence, when one needs to optimize controlled dynamical systems under partial observation, one often turns toward the formalism of Partially Observed Markov Decision Processes (Pomdp). Pomdps are often untractable in the general case as Dynamic Programming suffers from the curse of dimensionality. Instead of focusing on the general Pomdps, we present a subclass where transitions and observations mappings are deterministic: Deterministic Partially Observed Markov Decision Processes (Det-Pomdp). That subclass of problems has been studied by (Littman, 1996) and (Bonet, 2009). It was first considered as a limit case of Pomdps by Littman, mainly used to illustrate the complexity of Pomdps when considering as few sources of uncertainties as possible. In this paper, we improve on Littman's complexity bounds. We then introduce and study an even simpler class: Separated Det-Pomdps and give some new complexity bounds for this class. This new class of problems uses a property of the dynamics and observation to push back the curse of dimensionality
    • …
    corecore