11,284 research outputs found
Understanding the Evolution of Linear Regions in Deep Reinforcement Learning
Policies produced by deep reinforcement learning are typically characterised
by their learning curves, but they remain poorly understood in many other
respects. ReLU-based policies result in a partitioning of the input space into
piecewise linear regions. We seek to understand how observed region counts and
their densities evolve during deep reinforcement learning using empirical
results that span a range of continuous control tasks and policy network
dimensions. Intuitively, we may expect that during training, the region density
increases in the areas that are frequently visited by the policy, thereby
affording fine-grained control. We use recent theoretical and empirical results
for the linear regions induced by neural networks in supervised learning
settings for grounding and comparison of our results. Empirically, we find that
the region density increases only moderately throughout training, as measured
along fixed trajectories coming from the final policy. However, the
trajectories themselves also increase in length during training, and thus the
region densities decrease as seen from the perspective of the current
trajectory. Our findings suggest that the complexity of deep reinforcement
learning policies does not principally emerge from a significant growth in the
complexity of functions observed on-and-around trajectories of the policy.Comment: NeurIPS 2022 camera read
Learning and Testing Variable Partitions
Let be a multivariate function from a product set to an
Abelian group . A -partition of with cost is a partition of
the set of variables into non-empty subsets such that is -close to
for some with
respect to a given error metric. We study algorithms for agnostically learning
partitions and testing -partitionability over various groups and error
metrics given query access to . In particular we show that
Given a function that has a -partition of cost , a partition
of cost can be learned in time
for any .
In contrast, for and learning a partition of cost is NP-hard.
When is real-valued and the error metric is the 2-norm, a
2-partition of cost can be learned in time
.
When is -valued and the error metric is Hamming
weight, -partitionability is testable with one-sided error and
non-adaptive queries. We also show that even
two-sided testers require queries when .
This work was motivated by reinforcement learning control tasks in which the
set of control variables can be partitioned. The partitioning reduces the task
into multiple lower-dimensional ones that are relatively easier to learn. Our
second algorithm empirically increases the scores attained over previous
heuristic partitioning methods applied in this context.Comment: Innovations in Theoretical Computer Science (ITCS) 202
Learning a Partitioning Advisor with Deep Reinforcement Learning
Commercial data analytics products such as Microsoft Azure SQL Data Warehouse
or Amazon Redshift provide ready-to-use scale-out database solutions for
OLAP-style workloads in the cloud. While the provisioning of a database cluster
is usually fully automated by cloud providers, customers typically still have
to make important design decisions which were traditionally made by the
database administrator such as selecting the partitioning schemes.
In this paper we introduce a learned partitioning advisor for analytical
OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea
is that a DRL agent learns its decisions based on experience by monitoring the
rewards for different workloads and partitioning schemes. We evaluate our
learned partitioning advisor in an experimental evaluation with different
databases schemata and workloads of varying complexity. In the evaluation, we
show that our advisor is not only able to find partitionings that outperform
existing approaches for automated partitioning design but that it also can
easily adjust to different deployments. This is especially important in cloud
setups where customers can easily migrate their cluster to a new set of
(virtual) machines
Competitive function approximation for reinforcement learning
The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions.
We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin
- …