Search CORE

11,284 research outputs found

Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

Author: Cohan Setareh
Kim Nam Hee
Rolnick David
van de Panne Michiel
Publication venue
Publication date: 07/11/2022
Field of study

Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.Comment: NeurIPS 2022 camera read

arXiv.org e-Print Archive

Learning and Testing Variable Partitions

Author: Bogdanov Andrej
Wang Baoxiang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 01/01/2020
Field of study

Let

F

be a multivariate function from a product set

\Sigma^n

to an Abelian group

G

. A

k

-partition of

F

with cost

\delta

is a partition of the set of variables

\mathbf{V}

into

k

non-empty subsets

(\mathbf{X}_1, \dots, \mathbf{X}_k)

such that

F(\mathbf{V})

\delta

-close to

F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)

for some

F_1, \dots, F_k

with respect to a given error metric. We study algorithms for agnostically learning

k

partitions and testing

k

-partitionability over various groups and error metrics given query access to

F

. In particular we show that

1.

Given a function that has a

k

-partition of cost

\delta

, a partition of cost

\mathcal{O}(k n^2)(\delta + \epsilon)

can be learned in time

\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))

for any

\epsilon > 0

. In contrast, for

k = 2

and

n = 3

learning a partition of cost

\delta + \epsilon

is NP-hard.

2.

When

F

is real-valued and the error metric is the 2-norm, a 2-partition of cost

\sqrt{\delta^2 + \epsilon}

can be learned in time

\tilde{\mathcal{O}}(n^5/\epsilon^2)

3.

When

F

\mathbb{Z}_q

-valued and the error metric is Hamming weight,

k

-partitionability is testable with one-sided error and

\mathcal{O}(kn^3/\epsilon)

non-adaptive queries. We also show that even two-sided testers require

\Omega(n)

queries when

k = 2

. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.Comment: Innovations in Theoretical Computer Science (ITCS) 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Learning a Partitioning Advisor with Deep Reinforcement Learning

Author: Binnig Carsten
Hilprecht Benjamin
Roehm Uwe
Publication venue
Publication date: 01/01/2019
Field of study

Commercial data analytics products such as Microsoft Azure SQL Data Warehouse or Amazon Redshift provide ready-to-use scale-out database solutions for OLAP-style workloads in the cloud. While the provisioning of a database cluster is usually fully automated by cloud providers, customers typically still have to make important design decisions which were traditionally made by the database administrator such as selecting the partitioning schemes. In this paper we introduce a learned partitioning advisor for analytical OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea is that a DRL agent learns its decisions based on experience by monitoring the rewards for different workloads and partitioning schemes. We evaluate our learned partitioning advisor in an experimental evaluation with different databases schemata and workloads of varying complexity. In the evaluation, we show that our advisor is not only able to find partitionings that outperform existing approaches for automated partitioning design but that it also can easily adjust to different deployments. This is especially important in cloud setups where customers can easily migrate their cluster to a new set of (virtual) machines

arXiv.org e-Print Archive

TUbiblio

Crossref

Competitive function approximation for reinforcement learning

Author: Agostini Alejandro Gabriel
Celaya Llover Enric
Publication venue
Publication date: 01/01/2014
Field of study

The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC