1,782 research outputs found
Constructing Deterministic Finite-State Automata in Recurrent Neural Networks
Recurrent neural networks that are {\it trained} to behave like
deterministic finite-state automata (DFAs) can show deteriorating
performance when tested on long strings. This deteriorating performance
can be attributed to the instability of the internal representation of the
learned DFA states. The use of a sigmoidal discriminant function together
with the recurrent structure contribute to this instability. We prove
that a simple algorithm can {\it construct} second-order recurrent neural
networks with a sparse interconnection topology and sigmoidal discriminant
function such that the internal DFA state representations are stable, i.e.
the constructed network correctly classifies strings of {\it arbitrary
length}. The algorithm is based on encoding strengths of weights directly
into the neural network. We derive a relationship between the weight
strength and the number of DFA states for robust string classification.
For a DFA with states and input alphabet symbols, the constructive
algorithm generates a ``programmed" neural network with neurons and
weights. We compare our algorithm to other methods proposed in the
literature.
Revised in February 1996
(Also cross-referenced as UMIACS-TR-95-50
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Dynamics of Internal Models in Game Players
A new approach for the study of social games and communications is proposed.
Games are simulated between cognitive players who build the opponent's internal
model and decide their next strategy from predictions based on the model. In
this paper, internal models are constructed by the recurrent neural network
(RNN), and the iterated prisoner's dilemma game is performed. The RNN allows us
to express the internal model in a geometrical shape. The complicated
transients of actions are observed before the stable mutually defecting
equilibrium is reached. During the transients, the model shape also becomes
complicated and often experiences chaotic changes. These new chaotic dynamics
of internal models reflect the dynamical and high-dimensional rugged landscape
of the internal model space.Comment: 19 pages, 6 figure
- …