1,379 research outputs found
LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) has made prominent
progress in recent years. For training efficiency and scalability, most of the
MARL algorithms make all agents share the same policy or value network.
However, in many complex multi-agent tasks, different agents are expected to
possess specific abilities to handle different subtasks. In those scenarios,
sharing parameters indiscriminately may lead to similar behavior across all
agents, which will limit the exploration efficiency and degrade the final
performance. To balance the training complexity and the diversity of agent
behavior, we propose a novel framework to learn dynamic subtask assignment
(LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder
to construct a vector representation for each subtask according to its
identity. To reasonably assign agents to different subtasks, we propose an
ability-based subtask selection strategy, which can dynamically group agents
with similar abilities into the same subtask. In this way, agents dealing with
the same subtask share their learning of specific abilities and different
subtasks correspond to different specific abilities. We further introduce two
regularizers to increase the representation difference between subtasks and
stabilize the training by discouraging agents from frequently changing
subtasks, respectively. Empirical results show that LDSA learns reasonable and
effective subtask assignment for better collaboration and significantly
improves the learning performance on the challenging StarCraft II
micromanagement benchmark and Google Research Football
Multispectral persistent surveillance
The goal of a successful surveillance system to achieve persistence is to track everything that moves, all of the time, over the entire area of interest. The thrust of this thesis is to identify and improve upon the motion detection and object association aspect of this challenge by adding spectral information to the equation. Traditional motion detection and tracking systems rely primarily on single-band grayscale video, while more current research has focused on sensor fusion, specifically combining visible and IR data sources. A further challenge in covering an entire area of responsibility (AOR) is a limited sensor field of view, which can be overcome by either adding more sensors or multi-tasking a single sensor over multiple areas at a reduced frame rate. As an essential tool for sensor design and mission development, a trade study was conducted to measure the potential advantages of adding spectral bands of information in a single sensor with the intention of reducing sensor frame rates. Thus, traditional motion detection and object association algorithms were modified to evaluate system performance using five spectral bands (visible through thermal IR), while adjusting frame rate as a second variable. The goal of this research was to produce an evaluation of system performance as a function of the number of bands and frame rate. As such, performance surfaces were generated to assess relative performance as a function of the number of bands and frame rate
Load curve data cleansing and imputation via sparsity and low rank
The smart grid vision is to build an intelligent power network with an
unprecedented level of situational awareness and controllability over its
services and infrastructure. This paper advocates statistical inference methods
to robustify power monitoring tasks against the outlier effects owing to faulty
readings and malicious attacks, as well as against missing data due to privacy
concerns and communication errors. In this context, a novel load cleansing and
imputation scheme is developed leveraging the low intrinsic-dimensionality of
spatiotemporal load profiles and the sparse nature of "bad data.'' A robust
estimator based on principal components pursuit (PCP) is adopted, which effects
a twofold sparsity-promoting regularization through an -norm of the
outliers, and the nuclear norm of the nominal load profiles. Upon recasting the
non-separable nuclear norm into a form amenable to decentralized optimization,
a distributed (D-) PCP algorithm is developed to carry out the imputation and
cleansing tasks using networked devices comprising the so-termed advanced
metering infrastructure. If D-PCP converges and a qualification inequality is
satisfied, the novel distributed estimator provably attains the performance of
its centralized PCP counterpart, which has access to all networkwide data.
Computer simulations and tests with real load curve data corroborate the
convergence and effectiveness of the novel D-PCP algorithm.Comment: 8 figures, submitted to IEEE Transactions on Smart Grid - Special
issue on "Optimization methods and algorithms applied to smart grid
INVESTIGATING AGENT AND TASK OPENNESS IN ADHOC TEAM FORMATION
When deciding which ad hoc team to join, agents are often required to consider rewards from accomplishing tasks as well as potential benefits from learning when working with others, when solving tasks. We argue that, in order to decide when to learn or when to solve task, agents have to consider the existing agents’ capabilities and tasks available in the environment, and thus agents have to consider agent and task openness—the rate of new, previously unknown agents (and tasks) that are introduced into the environment. We further assume that agents evolve their capabilities intrinsically through learning by observation or learning by doing when working in a team. Thus, an agent will need to consider which task to do or which team to join would provide the best situation for such learning to occur. In this thesis, we develop an auction-based multiagent simulation framework, a mechanism to simulate openness in our environment, and conduct comprehensive experiments to investigate the impact of agent and task openness. We propose several agent task selection strategies to leverage the environmental openness. Furthermore, we present a multiagent solution for agent-based collaborative human task assignment when finding suitable tasks for users in complex environments is made especially challenging by agent openness and task openness. Using an auction-based protocol to fairly assign tasks, software agents model uncertainty in the outcomes of bids caused by openness, then acquire tasks for people that maximize both the user’s utility gain and learning opportunities for human users (who improve their abilities to accomplish future tasks through learning by experience and by observing more capable humans). Experimental results demonstrate the effects of agent and task openness on collaborative task assignment, the benefits of reasoning about openness, and the value of non-myopically choosing tasks to help people improve their abilities for uncertain future tasks
Towards Scalable Design of Future Wireless Networks
Wireless operators face an ever-growing challenge to meet the throughput and processing requirements of billions of devices that are getting connected. In current wireless networks, such as LTE and WiFi, these requirements are addressed by provisioning more resources: spectrum, transmitters, and baseband processors. However, this simple add-on approach to scale system performance is expensive and often results in resource underutilization. What are, then, the ways to efficiently scale the throughput and operational efficiency of these wireless networks? To answer this question, this thesis explores several potential designs: utilizing unlicensed spectrum to augment the bandwidth of a licensed network; coordinating transmitters to increase system throughput; and finally, centralizing wireless processing to reduce computing costs.
First, we propose a solution that allows LTE, a licensed wireless standard, to co-exist with WiFi in the unlicensed spectrum. The proposed solution bridges the incompatibility between the fixed access of LTE, and the random access of WiFi, through channel reservation. It achieves a fair LTE-WiFi co-existence despite the transmission gaps and unequal frame durations. Second, we consider a system where different MIMO transmitters coordinate to transmit data of multiple users.
We present an adaptive design of the channel feedback protocol that mitigates interference resulting from the imperfect channel information. Finally, we consider a Cloud-RAN architecture where a datacenter or a cloud resource processes wireless frames. We introduce a tree-based design for real-time transport of baseband samples and provide its end-to-end schedulability
and capacity analysis. We also present a processing framework that combines real-time scheduling with fine-grained parallelism. The framework reduces processing times by migrating parallelizable tasks to idle compute resources, and thus, decreases the processing deadline-misses at no additional cost.
We implement and evaluate the above solutions using software-radio platforms and off-the-shelf radios, and confirm their applicability in real-world settings.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133358/1/gkchai_1.pd
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
- …