50,838 research outputs found
Recommended from our members
Cost Efficient Distributed Load Frequency Control in Power Systems
The introduction of new technologies and increased penetration of renewable resources is altering the power distribution landscape which now includes a larger numbers of micro-generators. The centralized strategies currently employed for performing frequency control in a cost efficient way need to be revisited and decentralized to conform with the increase of distributed generation in the grid. In this paper, the use of Multi-Agent and Multi-Objective Reinforcement Learning techniques to train models to perform cost efficient frequency control through decentralized decision making is proposed. More specifically, we cast the frequency control problem as a Markov Decision Process and propose the use of reward composition and action composition multi-objective techniques and compare the results between the two. Reward composition is achieved by increasing the dimensionality of the reward function, while action composition is achieved through linear combination of actions produced by multiple single objective models. The proposed framework is validated through comparing the observed dynamics with the acceptable limits enforced in the industry and the cost optimal setups
Decentralization of Multiagent Policies by Learning What to Communicate
Effective communication is required for teams of robots to solve
sophisticated collaborative tasks. In practice it is typical for both the
encoding and semantics of communication to be manually defined by an expert;
this is true regardless of whether the behaviors themselves are bespoke,
optimization based, or learned. We present an agent architecture and training
methodology using neural networks to learn task-oriented communication
semantics based on the example of a communication-unaware expert policy. A
perimeter defense game illustrates the system's ability to handle dynamically
changing numbers of agents and its graceful degradation in performance as
communication constraints are tightened or the expert's observability
assumptions are broken.Comment: 7 page
A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems
Bike sharing provides an environment-friendly way for traveling and is
booming all over the world. Yet, due to the high similarity of user travel
patterns, the bike imbalance problem constantly occurs, especially for dockless
bike sharing systems, causing significant impact on service quality and company
revenue. Thus, it has become a critical task for bike sharing systems to
resolve such imbalance efficiently. In this paper, we propose a novel deep
reinforcement learning framework for incentivizing users to rebalance such
systems. We model the problem as a Markov decision process and take both
spatial and temporal features into consideration. We develop a novel deep
reinforcement learning algorithm called Hierarchical Reinforcement Pricing
(HRP), which builds upon the Deep Deterministic Policy Gradient algorithm.
Different from existing methods that often ignore spatial information and rely
heavily on accurate prediction, HRP captures both spatial and temporal
dependencies using a divide-and-conquer structure with an embedded localized
module. We conduct extensive experiments to evaluate HRP, based on a dataset
from Mobike, a major Chinese dockless bike sharing company. Results show that
HRP performs close to the 24-timeslot look-ahead optimization, and outperforms
state-of-the-art methods in both service level and bike distribution. It also
transfers well when applied to unseen areas
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
A survey of QoS-aware web service composition techniques
Web service composition can be briefly described as the process of aggregating services with disparate functionalities into a new composite service in order to meet increasingly complex needs of users. Service composition process has been accurate on dealing with services having disparate functionalities, however, over the years the number of web services in particular that exhibit similar functionalities and varying Quality of Service (QoS) has significantly increased. As such, the problem becomes how to select appropriate web services such that the QoS of the resulting composite service is maximized or, in some cases, minimized. This constitutes an NP-hard problem as it is complicated and difficult to solve. In this paper, a discussion of concepts of web service composition and a holistic review of current service composition techniques proposed in literature is presented. Our review spans several publications in the field that can serve as a road map for future research
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Coordinating agents to complete a set of tasks with intercoupled temporal and
resource constraints is computationally challenging, yet human domain experts
can solve these difficult scheduling problems using paradigms learned through
years of apprenticeship. A process for manually codifying this domain knowledge
within a computational framework is necessary to scale beyond the
``single-expert, single-trainee" apprenticeship model. However, human domain
experts often have difficulty describing their decision-making processes,
causing the codification of this knowledge to become laborious. We propose a
new approach for capturing domain-expert heuristics through a pairwise ranking
formulation. Our approach is model-free and does not require enumerating or
iterating through a large state space. We empirically demonstrate that this
approach accurately learns multifaceted heuristics on a synthetic data set
incorporating job-shop scheduling and vehicle routing problems, as well as on
two real-world data sets consisting of demonstrations of experts solving a
weapon-to-target assignment problem and a hospital resource allocation problem.
We also demonstrate that policies learned from human scheduling demonstration
via apprenticeship learning can substantially improve the efficiency of a
branch-and-bound search for an optimal schedule. We employ this human-machine
collaborative optimization technique on a variant of the weapon-to-target
assignment problem. We demonstrate that this technique generates solutions
substantially superior to those produced by human domain experts at a rate up
to 9.5 times faster than an optimization approach and can be applied to
optimally solve problems twice as complex as those solved by a human
demonstrator.Comment: Portions of this paper were published in the Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and
in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper
consists of 50 pages with 11 figures and 4 table
- …