12,288 research outputs found

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Learning scalable and transferable multi-robot/machine sequential assignment planning via graph embedding

    Full text link
    Can the success of reinforcement learning methods for simple combinatorial optimization problems be extended to multi-robot sequential assignment planning? In addition to the challenge of achieving near-optimal performance in large problems, transferability to an unseen number of robots and tasks is another key challenge for real-world applications. In this paper, we suggest a method that achieves the first success in both challenges for robot/machine scheduling problems. Our method comprises of three components. First, we show a robot scheduling problem can be expressed as a random probabilistic graphical model (PGM). We develop a mean-field inference method for random PGM and use it for Q-function inference. Second, we show that transferability can be achieved by carefully designing two-step sequential encoding of problem state. Third, we resolve the computational scalability issue of fitted Q-iteration by suggesting a heuristic auction-based Q-iteration fitting method enabled by transferability we achieved. We apply our method to discrete-time, discrete space problems (Multi-Robot Reward Collection (MRRC)) and scalably achieve 97% optimality with transferability. This optimality is maintained under stochastic contexts. By extending our method to continuous time, continuous space formulation, we claim to be the first learning-based method with scalable performance among multi-machine scheduling problems; our method scalability achieves comparable performance to popular metaheuristics in Identical parallel machine scheduling (IPMS) problems

    Perturbation realization, potentials, and sensitivity analysis of Markov processes

    Get PDF
    Abstract — Two fundamental concepts and quantities, realization factors and performance potentials, are introduced for Markov processes. The relations among these two quantities and the group inverse of the infinitesimal generator are studied. It is shown that the sensitivity of the steady-state performance with respect to the change of the infinitesimal generator can be easily calculated by using either of these three quantities and that these quantities can be estimated by analyzing a single sample path of a Markov process. Based on these results, algorithms for estimating performance sensitivities on a single sample path of a Markov process can be proposed. The potentials in this paper are defined through realization factors and are shown to be the same as those defined by Poisson equations. The results provide a uniform framework of perturbation realization for infinitesimal perturbation analysis (IPA) and non-IPA approaches to the sensitivity analysis of steady-state performance; they also provide a theoretical background for the PA algorithms developed in recent years. Index Terms—Perturbation analysis, Poisson equations, samplepath analysis

    REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes

    Full text link
    One of the key limitations of Molecular Dynamics simulations is the computational intractability of sampling protein conformational landscapes associated with either large system size or long timescales. To overcome this bottleneck, we present the REinforcement learning based Adaptive samPling (REAP) algorithm that aims to efficiently sample conformational space by learning the relative importance of each reaction coordinate as it samples the landscape. To achieve this, the algorithm uses concepts from the field of reinforcement learning, a subset of machine learning, which rewards sampling along important degrees of freedom and disregards others that do not facilitate exploration or exploitation. We demonstrate the effectiveness of REAP by comparing the sampling to long continuous MD simulations and least-counts adaptive sampling on two model landscapes (L-shaped and circular), and realistic systems such as alanine dipeptide and Src kinase. In all four systems, the REAP algorithm consistently demonstrates its ability to explore conformational space faster than the other two methods when comparing the expected values of the landscape discovered for a given amount of time. The key advantage of REAP is on-the-fly estimation of the importance of collective variables, which makes it particularly useful for systems with limited structural information

    Distributed Linear Parameter Estimation: Asymptotically Efficient Adaptive Strategies

    Full text link
    The paper considers the problem of distributed adaptive linear parameter estimation in multi-agent inference networks. Local sensing model information is only partially available at the agents and inter-agent communication is assumed to be unpredictable. The paper develops a generic mixed time-scale stochastic procedure consisting of simultaneous distributed learning and estimation, in which the agents adaptively assess their relative observation quality over time and fuse the innovations accordingly. Under rather weak assumptions on the statistical model and the inter-agent communication, it is shown that, by properly tuning the consensus potential with respect to the innovation potential, the asymptotic information rate loss incurred in the learning process may be made negligible. As such, it is shown that the agent estimates are asymptotically efficient, in that their asymptotic covariance coincides with that of a centralized estimator (the inverse of the centralized Fisher information rate for Gaussian systems) with perfect global model information and having access to all observations at all times. The proof techniques are mainly based on convergence arguments for non-Markovian mixed time scale stochastic approximation procedures. Several approximation results developed in the process are of independent interest.Comment: Submitted to SIAM Journal on Control and Optimization journal. Initial Submission: Sept. 2011. Revised: Aug. 201
    • …
    corecore