Search CORE

1,782 research outputs found

Constructing Deterministic Finite-State Automata in Recurrent Neural Networks

Author: Giles C. Lee
Omlin Christian W.
Publication venue
Publication date: 15/10/1998
Field of study

Recurrent neural networks that are {\it trained} to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can {\it construct} second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of {\it arbitrary length}. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with

n

states and

m

input alphabet symbols, the constructive algorithm generates a ``programmed" neural network with

O(n)

neurons and

O(mn)

weights. We compare our algorithm to other methods proposed in the literature. Revised in February 1996 (Also cross-referenced as UMIACS-TR-95-50

Digital Repository at the University of Maryland

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Dynamics of Internal Models in Game Players

Author: Axelrod
Bray
Ikegami
Ikegami
Kalai
Lindgren
Lorberbaum
Makoto Taiji
Matsushima
Molander
Muller
Pollack
Rashevsky
Rubinstein
Rössler
Takashi Ikegami
Tani
Williams
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

A new approach for the study of social games and communications is proposed. Games are simulated between cognitive players who build the opponent's internal model and decide their next strategy from predictions based on the model. In this paper, internal models are constructed by the recurrent neural network (RNN), and the iterated prisoner's dilemma game is performed. The RNN allows us to express the internal model in a geometrical shape. The complicated transients of actions are observed before the stable mutually defecting equilibrium is reached. During the transients, the model shape also becomes complicated and often experiences chaotic changes. These new chaotic dynamics of internal models reflect the dynamical and high-dimensional rugged landscape of the internal model space.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref