5 research outputs found
TACT: A Transfer Actor-Critic Learning Framework for Energy Saving in Cellular Radio Access Networks
Recent works have validated the possibility of improving energy efficiency in
radio access networks (RANs), achieved by dynamically turning on/off some base
stations (BSs). In this paper, we extend the research over BS switching
operations, which should match up with traffic load variations. Instead of
depending on the dynamic traffic loads which are still quite challenging to
precisely forecast, we firstly formulate the traffic variations as a Markov
decision process. Afterwards, in order to foresightedly minimize the energy
consumption of RANs, we design a reinforcement learning framework based BS
switching operation scheme. Furthermore, to avoid the underlying curse of
dimensionality in reinforcement learning, a transfer actor-critic algorithm
(TACT), which utilizes the transferred learning expertise in historical periods
or neighboring regions, is proposed and provably converges. In the end, we
evaluate our proposed scheme by extensive simulations under various practical
configurations and show that the proposed TACT algorithm contributes to a
performance jumpstart and demonstrates the feasibility of significant energy
efficiency improvement at the expense of tolerable delay performance.Comment: 11 figures, 30 pages, accepted in IEEE Transactions on Wireless
Communications 2014. IEEE Trans. Wireless Commun., Feb. 201
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control
We consider the problem of finding a control policy for a Markov Decision
Process (MDP) to maximize the probability of reaching some states while
avoiding some other states. This problem is motivated by applications in
robotics, where such problems naturally arise when probabilistic models of
robot motion are required to satisfy temporal logic task specifications. We
transform this problem into a Stochastic Shortest Path (SSP) problem and
develop a new approximate dynamic programming algorithm to solve it. This
algorithm is of the actor-critic type and uses a least-square temporal
difference learning method. It operates on sample paths of the system and
optimizes the policy within a pre-specified class parameterized by a
parsimonious set of parameters. We show its convergence to a policy
corresponding to a stationary point in the parameters' space. Simulation
results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201
Anomaly detection and dynamic decision making for stochastic systems
Thesis (Ph.D.)--Boston UniversityThis dissertation focuses on two types of problems, both of which are related to systems with uncertainties.
The first problem concerns network system anomaly detection. We present several stochastic and deterministic methods for anomaly detection of networks whose normal behavior is not time-varying. Our methods cover most of the common techniques in the anomaly detection field. We evaluate all methods in a simulated network that consists of nominal data, three flow-level anomalies and one packet-level attack. Through analyzing the results, we summarize the advantages and the disadvantages of each method. As a next step, we propose two robust stochastic anomaly detection methods for networks whose normal behavior is time-varying. We develop a procedure for learning the underlying family of patterns that characterize a time-varying network.
This procedure first estimates a large class of patterns from network data and then refines it to select a representative subset. The latter part formulates the refinement problem using ideas from set covering via integer programming. Then we propose two robust methods, one model-free and one model-based, to evaluate whether a sequence of observations is drawn from the learned patterns. Simulation results show that the robust methods have significant advantages over the alternative stationary methods in time-varying networks. The final anomaly detection setting we consider targets the detection of botnets before they launch an attack. Our method analyzes the social graph of the nodes in a network and consists of two stages: (i) network anomaly detection based on large deviations theory and (ii) community detection based on a refined modularity measure. We apply our method on real-world botnet traffic and compare its performance with other methods.
The second problem considered by this dissertation concerns sequential decision mak- ings under uncertainty, which can be modeled by a Markov Decision Processes (MDPs). We focus on methods with an actor-critic structure, where the critic part estimates the gradient of the overall objective with respect to tunable policy parameters and the actor part optimizes a policy with respect to these parameters. Most existing actor- critic methods use Temporal Difference (TD) learning to estimate the gradient and steepest gradient ascent to update the policies. Our first contribution is to propose an actor-critic method that uses a Least Squares Temporal Difference (LSTD) method, which is known to converge faster than the TD methods. Our second contribution is to develop a new Newton-like actor-critic method that performs better especially for ill-conditioned problems. We evaluate our methods in problems motivated from robot motion control