12,288 research outputs found
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Learning scalable and transferable multi-robot/machine sequential assignment planning via graph embedding
Can the success of reinforcement learning methods for simple combinatorial
optimization problems be extended to multi-robot sequential assignment
planning? In addition to the challenge of achieving near-optimal performance in
large problems, transferability to an unseen number of robots and tasks is
another key challenge for real-world applications. In this paper, we suggest a
method that achieves the first success in both challenges for robot/machine
scheduling problems.
Our method comprises of three components. First, we show a robot scheduling
problem can be expressed as a random probabilistic graphical model (PGM). We
develop a mean-field inference method for random PGM and use it for Q-function
inference. Second, we show that transferability can be achieved by carefully
designing two-step sequential encoding of problem state. Third, we resolve the
computational scalability issue of fitted Q-iteration by suggesting a heuristic
auction-based Q-iteration fitting method enabled by transferability we
achieved.
We apply our method to discrete-time, discrete space problems (Multi-Robot
Reward Collection (MRRC)) and scalably achieve 97% optimality with
transferability. This optimality is maintained under stochastic contexts. By
extending our method to continuous time, continuous space formulation, we claim
to be the first learning-based method with scalable performance among
multi-machine scheduling problems; our method scalability achieves comparable
performance to popular metaheuristics in Identical parallel machine scheduling
(IPMS) problems
Perturbation realization, potentials, and sensitivity analysis of Markov processes
Abstract — Two fundamental concepts and quantities, realization factors and performance potentials, are introduced for Markov processes. The relations among these two quantities and the group inverse of the infinitesimal generator are studied. It is shown that the sensitivity of the steady-state performance with respect to the change of the infinitesimal generator can be easily calculated by using either of these three quantities and that these quantities can be estimated by analyzing a single sample path of a Markov process. Based on these results, algorithms for estimating performance sensitivities on a single sample path of a Markov process can be proposed. The potentials in this paper are defined through realization factors and are shown to be the same as those defined by Poisson equations. The results provide a uniform framework of perturbation realization for infinitesimal perturbation analysis (IPA) and non-IPA approaches to the sensitivity analysis of steady-state performance; they also provide a theoretical background for the PA algorithms developed in recent years. Index Terms—Perturbation analysis, Poisson equations, samplepath analysis
REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes
One of the key limitations of Molecular Dynamics simulations is the
computational intractability of sampling protein conformational landscapes
associated with either large system size or long timescales. To overcome this
bottleneck, we present the REinforcement learning based Adaptive samPling
(REAP) algorithm that aims to efficiently sample conformational space by
learning the relative importance of each reaction coordinate as it samples the
landscape. To achieve this, the algorithm uses concepts from the field of
reinforcement learning, a subset of machine learning, which rewards sampling
along important degrees of freedom and disregards others that do not facilitate
exploration or exploitation. We demonstrate the effectiveness of REAP by
comparing the sampling to long continuous MD simulations and least-counts
adaptive sampling on two model landscapes (L-shaped and circular), and
realistic systems such as alanine dipeptide and Src kinase. In all four
systems, the REAP algorithm consistently demonstrates its ability to explore
conformational space faster than the other two methods when comparing the
expected values of the landscape discovered for a given amount of time. The key
advantage of REAP is on-the-fly estimation of the importance of collective
variables, which makes it particularly useful for systems with limited
structural information
Distributed Linear Parameter Estimation: Asymptotically Efficient Adaptive Strategies
The paper considers the problem of distributed adaptive linear parameter
estimation in multi-agent inference networks. Local sensing model information
is only partially available at the agents and inter-agent communication is
assumed to be unpredictable. The paper develops a generic mixed time-scale
stochastic procedure consisting of simultaneous distributed learning and
estimation, in which the agents adaptively assess their relative observation
quality over time and fuse the innovations accordingly. Under rather weak
assumptions on the statistical model and the inter-agent communication, it is
shown that, by properly tuning the consensus potential with respect to the
innovation potential, the asymptotic information rate loss incurred in the
learning process may be made negligible. As such, it is shown that the agent
estimates are asymptotically efficient, in that their asymptotic covariance
coincides with that of a centralized estimator (the inverse of the centralized
Fisher information rate for Gaussian systems) with perfect global model
information and having access to all observations at all times. The proof
techniques are mainly based on convergence arguments for non-Markovian mixed
time scale stochastic approximation procedures. Several approximation results
developed in the process are of independent interest.Comment: Submitted to SIAM Journal on Control and Optimization journal.
Initial Submission: Sept. 2011. Revised: Aug. 201
- …