1,294 research outputs found
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Dynamics of Internal Models in Game Players
A new approach for the study of social games and communications is proposed.
Games are simulated between cognitive players who build the opponent's internal
model and decide their next strategy from predictions based on the model. In
this paper, internal models are constructed by the recurrent neural network
(RNN), and the iterated prisoner's dilemma game is performed. The RNN allows us
to express the internal model in a geometrical shape. The complicated
transients of actions are observed before the stable mutually defecting
equilibrium is reached. During the transients, the model shape also becomes
complicated and often experiences chaotic changes. These new chaotic dynamics
of internal models reflect the dynamical and high-dimensional rugged landscape
of the internal model space.Comment: 19 pages, 6 figure
Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks
Inspired by number series tests to measure human intelligence, we suggest
number sequence prediction tasks to assess neural network models' computational
powers for solving algorithmic problems. We define the complexity and
difficulty of a number sequence prediction task with the structure of the
smallest automaton that can generate the sequence. We suggest two types of
number sequence prediction problems: the number-level and the digit-level
problems. The number-level problems format sequences as 2-dimensional grids of
digits and the digit-level problems provide a single digit input per a time
step. The complexity of a number-level sequence prediction can be defined with
the depth of an equivalent combinatorial logic, and the complexity of a
digit-level sequence prediction can be defined with an equivalent state
automaton for the generation rule. Experiments with number-level sequences
suggest that CNN models are capable of learning the compound operations of
sequence generation rules, but the depths of the compound operations are
limited. For the digit-level problems, simple GRU and LSTM models can solve
some problems with the complexity of finite state automata. Memory augmented
models such as Stack-RNN, Attention, and Neural Turing Machines can solve the
reverse-order task which has the complexity of simple pushdown automaton.
However, all of above cannot solve general Fibonacci, Arithmetic or Geometric
sequence generation problems that represent the complexity of queue automata or
Turing machines. The results show that our number sequence prediction problems
effectively evaluate machine learning models' computational capabilities.Comment: Accepted to 2019 AAAI Conference on Artificial Intelligenc
- …