572 research outputs found
Learning and Testing Variable Partitions
Let be a multivariate function from a product set to an
Abelian group . A -partition of with cost is a partition of
the set of variables into non-empty subsets such that is -close to
for some with
respect to a given error metric. We study algorithms for agnostically learning
partitions and testing -partitionability over various groups and error
metrics given query access to . In particular we show that
Given a function that has a -partition of cost , a partition
of cost can be learned in time
for any .
In contrast, for and learning a partition of cost is NP-hard.
When is real-valued and the error metric is the 2-norm, a
2-partition of cost can be learned in time
.
When is -valued and the error metric is Hamming
weight, -partitionability is testable with one-sided error and
non-adaptive queries. We also show that even
two-sided testers require queries when .
This work was motivated by reinforcement learning control tasks in which the
set of control variables can be partitioned. The partitioning reduces the task
into multiple lower-dimensional ones that are relatively easier to learn. Our
second algorithm empirically increases the scores attained over previous
heuristic partitioning methods applied in this context.Comment: Innovations in Theoretical Computer Science (ITCS) 202
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
Reinforcement learning algorithms typically consider discrete-time dynamics,
even though the underlying systems are often continuous in time. In this paper,
we introduce a model-based reinforcement learning algorithm that represents
continuous-time dynamics using nonlinear ordinary differential equations
(ODEs). We capture epistemic uncertainty using well-calibrated probabilistic
models, and use the optimistic principle for exploration. Our regret bounds
surface the importance of the measurement selection strategy(MSS), since in
continuous time we not only must decide how to explore, but also when to
observe the underlying system. Our analysis demonstrates that the regret is
sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of
MSS, such as equidistant sampling. Additionally, we propose an adaptive,
data-dependent, practical MSS that, when combined with GP dynamics, also
achieves sublinear regret with significantly fewer samples. We showcase the
benefits of continuous-time modeling over its discrete-time counterpart, as
well as our proposed adaptive MSS over standard baselines, on several
applications
- …