13,338 research outputs found

    DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

    Full text link
    A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.Comment: IJCAI 202

    Monte Carlo optimization of decentralized estimation networks over directed acyclic graphs under communication constraints

    Get PDF
    Motivated by the vision of sensor networks, we consider decentralized estimation networks over bandwidth–limited communication links, and are particularly interested in the tradeoff between the estimation accuracy and the cost of communications due to, e.g., energy consumption. We employ a class of in–network processing strategies that admits directed acyclic graph representations and yields a tractable Bayesian risk that comprises the cost of communications and estimation error penalty. This perspective captures a broad range of possibilities for processing under network constraints and enables a rigorous design problem in the form of constrained optimization. A similar scheme and the structures exhibited by the solutions have been previously studied in the context of decentralized detection. Under reasonable assumptions, the optimization can be carried out in a message passing fashion. We adopt this framework for estimation, however, the corresponding optimization scheme involves integral operators that cannot be evaluated exactly in general. We develop an approximation framework using Monte Carlo methods and obtain particle representations and approximate computational schemes for both the in–network processing strategies and their optimization. The proposed Monte Carlo optimization procedure operates in a scalable and efficient fashion and, owing to the non-parametric nature, can produce results for any distributions provided that samples can be produced from the marginals. In addition, this approach exhibits graceful degradation of the estimation accuracy asymptotically as the communication becomes more costly, through a parameterized Bayesian risk

    Monte Carlo optimization approach for decentralized estimation networks under communication constraints

    Get PDF
    We consider designing decentralized estimation schemes over bandwidth limited communication links with a particular interest in the tradeoff between the estimation accuracy and the cost of communications due to, e.g., energy consumption. We take two classes of in–network processing strategies into account which yield graph representations through modeling the sensor platforms as the vertices and the communication links by edges as well as a tractable Bayesian risk that comprises the cost of transmissions and penalty for the estimation errors. This approach captures a broad range of possibilities for “online” processing of observations as well as the constraints imposed and enables a rigorous design setting in the form of a constrained optimization problem. Similar schemes as well as the structures exhibited by the solutions to the design problem has been studied previously in the context of decentralized detection. Under reasonable assumptions, the optimization can be carried out in a message passing fashion. We adopt this framework for estimation, however, the corresponding optimization schemes involve integral operators that cannot be evaluated exactly in general. We develop an approximation framework using Monte Carlo methods and obtain particle representations and approximate computational schemes for both classes of in–network processing strategies and their optimization. The proposed Monte Carlo optimization procedures operate in a scalable and efficient fashion and, owing to the non-parametric nature, can produce results for any distributions provided that samples can be produced from the marginals. In addition, this approach exhibits graceful degradation of the estimation accuracy asymptotically as the communication becomes more costly, through a parameterized Bayesian risk

    Belief Tree Search for Active Object Recognition

    Full text link
    Active Object Recognition (AOR) has been approached as an unsupervised learning problem, in which optimal trajectories for object inspection are not known and are to be discovered by reducing label uncertainty measures or training with reinforcement learning. Such approaches have no guarantees of the quality of their solution. In this paper, we treat AOR as a Partially Observable Markov Decision Process (POMDP) and find near-optimal policies on training data using Belief Tree Search (BTS) on the corresponding belief Markov Decision Process (MDP). AOR then reduces to the problem of knowledge transfer from near-optimal policies on training set to the test set. We train a Long Short Term Memory (LSTM) network to predict the best next action on the training set rollouts. We sho that the proposed AOR method generalizes well to novel views of familiar objects and also to novel objects. We compare this supervised scheme against guided policy search, and find that the LSTM network reaches higher recognition accuracy compared to the guided policy method. We further look into optimizing the observation function to increase the total collected reward of optimal policy. In AOR, the observation function is known only approximately. We propose a gradient-based method update to this approximate observation function to increase the total reward of any policy. We show that by optimizing the observation function and retraining the supervised LSTM network, the AOR performance on the test set improves significantly.Comment: IROS 201

    Monte Carlo optimization approach for decentralized estimation networks under communication constraints

    Get PDF
    We consider designing decentralized estimation schemes over bandwidth limited communication links with a particular interest in the tradeoff between the estimation accuracy and the cost of communications due to, e.g., energy consumption. We take two classes of in–network processing strategies into account which yield graph representations through modeling the sensor platforms as the vertices and the communication links by edges as well as a tractable Bayesian risk that comprises the cost of transmissions and penalty for the estimation errors. This approach captures a broad range of possibilities for “online” processing of observations as well as the constraints imposed and enables a rigorous design setting in the form of a constrained optimization problem. Similar schemes as well as the structures exhibited by the solutions to the design problem has been studied previously in the context of decentralized detection. Under reasonable assumptions, the optimization can be carried out in a message passing fashion. We adopt this framework for estimation, however, the corresponding optimization schemes involve integral operators that cannot be evaluated exactly in general. We develop an approximation framework using Monte Carlo methods and obtain particle representations and approximate computational schemes for both classes of in–network processing strategies and their optimization. The proposed Monte Carlo optimization procedures operate in a scalable and efficient fashion and, owing to the non-parametric nature, can produce results for any distributions provided that samples can be produced from the marginals. In addition, this approach exhibits graceful degradation of the estimation accuracy asymptotically as the communication becomes more costly, through a parameterized Bayesian risk
    • …
    corecore