250 research outputs found
A Gentle Introduction to Epistemic Planning: The DEL Approach
Epistemic planning can be used for decision making in multi-agent situations
with distributed knowledge and capabilities. Dynamic Epistemic Logic (DEL) has
been shown to provide a very natural and expressive framework for epistemic
planning. In this paper, we aim to give an accessible introduction to DEL-based
epistemic planning. The paper starts with the most classical framework for
planning, STRIPS, and then moves towards epistemic planning in a number of
smaller steps, where each step is motivated by the need to be able to model
more complex planning scenarios.Comment: In Proceedings M4M9 2017, arXiv:1703.0173
Causal Discovery from Temporal Data: An Overview and New Perspectives
Temporal data, representing chronological observations of complex systems,
has always been a typical data structure that can be widely generated by many
domains, such as industry, medicine and finance. Analyzing this type of data is
extremely valuable for various applications. Thus, different temporal data
analysis tasks, eg, classification, clustering and prediction, have been
proposed in the past decades. Among them, causal discovery, learning the causal
relations from temporal data, is considered an interesting yet critical task
and has attracted much research attention. Existing casual discovery works can
be divided into two highly correlated categories according to whether the
temporal data is calibrated, ie, multivariate time series casual discovery, and
event sequence casual discovery. However, most previous surveys are only
focused on the time series casual discovery and ignore the second category. In
this paper, we specify the correlation between the two categories and provide a
systematical overview of existing solutions. Furthermore, we provide public
datasets, evaluation metrics and new perspectives for temporal data casual
discovery.Comment: 52 pages, 6 figure
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
In this work, we study the low-rank MDPs with adversarially changed losses in
the full-information feedback setting. In particular, the unknown transition
probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and
the loss functions may change adversarially but are revealed to the learner at
the end of each episode. We propose a policy optimization-based algorithm POLO,
and we prove that it attains the
regret
guarantee, where is rank of the transition kernel (and hence the dimension
of the unknown representations), is the cardinality of the action space,
is the cardinality of the model class, and is the discounted
factor. Notably, our algorithm is oracle-efficient and has a regret guarantee
with no dependence on the size of potentially arbitrarily large state space.
Furthermore, we also prove an
regret lower bound for this problem, showing that low-rank MDPs are
statistically more difficult to learn than linear MDPs in the regret
minimization setting. To the best of our knowledge, we present the first
algorithm that interleaves representation learning, exploration, and
exploitation to achieve the sublinear regret guarantee for RL with nonlinear
function approximation and adversarial losses
- …