Search CORE

3,010 research outputs found

Feed-Forward Neural Networks Need Inductive Bias to Learn Equality Relations

Author: Kopparti R. M.
Weyde T.
Publication venue
Publication date: 01/01/2018
Field of study

Basic binary relations such as equality and inequality are fundamental to relational data structures. Neural networks should learn such relations and generalise to new unseen data. We show in this study, however, that this generalisation fails with standard feed-forward networks on binary vectors. Even when trained with maximal training data, standard networks do not reliably detect equality. We introduce differential rectifier (DR) units that we add to the network in different configurations. The DR units create an inductive bias in the networks, so that they do learn to generalise, even from small numbers of examples and we have not found any negative effect of their inclusion in the network. Given the fundamental nature of these relations, we hypothesize that feed-forward neural network learning benefits from inductive bias in other relations as well. Consequently, the further development of suitable inductive biases will be beneficial to many tasks in relational learning with neural networks

arXiv.org e-Print Archive

City Research Online

Modelling Identity Rules with Neural Networks

Author: Kopparti R. M.
Weyde T.
Publication venue: IfCoLog
Publication date: 20/05/2019
Field of study

In this paper, we show that standard feed-forward and recurrent neural networks fail to learn abstract patterns based on identity rules. We propose Repetition Based Pattern (RBP) extensions to neural network structures that solve this problem and answer, as well as raise, questions about integrating structures for inductive bias into neural networks. Examples of abstract patterns are the sequence patterns ABA and ABB where A or B can be any object. These were introduced by Marcus et al (1999) who also found that 7 month old infants recognise these patterns in sequences that use an unfamiliar vocabulary while simple recurrent neural networks do not. This result has been contested in the literature but it is confirmed by our experiments. We also show that the inability to generalise extends to different, previously untested, settings. We propose a new approach to modify standard neural network architectures, called Repetition Based Patterns (RBP) with different variants for classification and prediction. Our experiments show that neural networks with the appropriate RBP structure achieve perfect classification and prediction performance on synthetic data, including mixed concrete and abstract patterns. RBP also improves neural network performance in experiments with real-world sequence prediction tasks. We discuss these finding in terms of challenges for neural network models and identify consequences from this result in terms of developing inductive biases for neural network learning

arXiv.org e-Print Archive

City Research Online

Factors for the Generalisation of Identity Relations by Neural Networks

Author: Kopparti R. M.
Weyde T.
Publication venue
Publication date: 01/01/2019
Field of study

Many researchers implicitly assume that neural networks learn relations and generalise them to new unseen data. It has been shown recently, however, that the generalisation of feed-forward networks fails for identity relations.The proposed solution for this problem is to create an inductive bias with Differential Rectifier (DR) units. In this work we explore various factors in the neural network architecture and learning process whether they make a difference to the generalisation on equality detection of Neural Networks without and and with DR units in early and mid fusion architectures. We find in experiments with synthetic data effects of the number of hidden layers, the activation function and the data representation. The training set size in relation to the total possible set of vectors also makes a difference. However, the accuracy never exceeds 61% without DR units at 50% chance level. DR units improve generalisation in all tasks and lead to almost perfect test accuracy in the Mid Fusion setting. Thus, DR units seem to be a promising approach for creating generalisation abilities that standard networks lack

arXiv.org e-Print Archive

City Research Online

Recommended from our members

Weight Priors for Learning Identity Relations

Author: Kopparti R. M.
Weyde T.
Publication venue
Publication date: 14/12/2019
Field of study

Learning abstract and systematic relations has been an open issue in neural network learning for over 30 years. It has been shown recently that neural networks do not learn relations based on identity and are unable to generalize well to unseen data. The Relation Based Pattern (RBP) approach has been proposed as a solution for this problem. In this work, we extend RBP by realizing it as a Bayesian prior on network weights to model the identity relations. This weight prior leads to a modified regularization term in otherwise standard network learning. In our experiments, we show that the Bayesian weight priors lead to perfect generalization when learning identity based relations and do not impede general neural network learning. We believe that the approach of creating an inductive bias with weight priors can be extended easily to other forms of relations and will be beneficial for many other learning tasks

City Research Online

Recommended from our members

Abstract Rule Based Pattern Learning with Neural Networks

Author: Kopparti R. M.
Publication venue
Publication date: 01/01/2020
Field of study

The ability to learn abstractions and generalise is seen as the essence of human intelligence.7 Since 1950s, there have been efforts to build systems that learn and think like humans.16 It is observed that humans including infants tend to have good generalisation power when compared to the machine learning models in which hypothesis is usually approximated and may be prone to errors. The examples proposed by Marcus19,18,17 such as the failure to generalise equality, distinguish between even to odd numbers or the recognition of ABA or ABB patterns of syllables have attracted a significant amount of attention in psychology, particularly in the study of human language learning, but they have not been addressed systematically as problems of machine learning and neural networks. In this article, the problem of learning abstract rules using neural networks is explained and a solution called ‘Relation Based Patterns’ (RBP) which model abstract relationships based on equality is proposed. RBP creates an inductive bias in the neural networks that leads to the learning of generalisable solutions. It is observed that integration of RBP leads to almost perfect generalisation in abstract rule learning tasks with synthetic data and to improvements in neural language modelling on real-world data. The outline of the article is as follows : introduction to the problem is briefly described followed by a section on what is abstract pattern (rule) learning, the need for inductive bias and various ways of adding inductive bias into neural networks. The RBP method and its integration along with the experiments on the tasks of abstract rule learning, character prediction and melody prediction are summarized followed by conclusions and future work

City Research Online

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Abstract pattern learning with neural networks for improved sequential modeling

Author: Kopparti R. M.
Publication venue
Publication date
Field of study

Deep neural networks have been widely used for various applications and have produced state-of-the-art results in domains like speech recognition, machine translation, and image recognition. Despite the impressive successes achieved with deep neural networks, there has been an increasing awareness that there are tasks that still elude neural network learning, specifically the learning of abstract grammatical patterns and generalisation of the abstract patterns beyond the training data. In this thesis, the problem of learning abstract patterns based on equality (also called identity) with neural networks is addressed. It was found in this study that feed-forward neural networks do not learn equality. This leads to feed-forward and recurrent neural networks’ inability to learn abstract patterns. This problem is studied empirically and constructive solutions are developed in this thesis. A solution is proposed, which is called ‘Relation Based Patterns’ (RBP) models abstract relationships based on equality by using fixed weights and a special type of neuron. An extension of RBP called ‘Embedded Relation Based Patterns’ (ERBP) is also proposed which models RBP as a Bayesian prior on network weights implemented as a regularisation term in otherwise standard neural network learning. Both RBP and particularly ERBP are very easy to integrate into standard neural network models. It is observed in experiments that integration of (E)RBP structures leads to almost perfect generalisation in abstract pattern learning tasks with synthetic data and to improvements also in neural language and music modeling. (E)RBP has been successfully applied on various neural network models like Feed-forward neural network (FFNN), RNN and their gated variants like GRUs and LSTMs, Transformers and Graph Neural Networks. It leads to improvements on real-word tasks like melody prediction, character and word prediction, abstract compositionality and graph edit distance

City Research Online

Recommended from our members

Neural Diagrammatic Reasoning

Author: Wang Duo
Publication venue: University of Cambridge
Publication date: 01/08/2020
Field of study

Diagrams have been shown to be effective tools for humans to represent and reason about complex concepts. They have been widely used to represent concepts in science teaching, to communicate workflow in industries and to measure human fluid intelligence. Mechanised reasoning systems typically encode diagrams into symbolic representations that can be easily processed with rule-based expert systems. This relies on human experts to define the framework of diagram-to-symbol mapping and the set of rules to reason with the symbols. This means the reasoning systems cannot be easily adapted to other diagrams without a new set of human-defined representation mapping and reasoning rules. Moreover such systems are not able to cope with diagram inputs as raw and possibly noisy images. The need for human input and the lack of robustness to noise significantly limit the applications of mechanised diagrammatic reasoning systems. A key research question then arises: can we develop human-like reasoning systems that learn to reason robustly without predefined reasoning rules? To answer this question, I propose Neural Diagrammatic Reasoning, a new family of diagrammatic reasoning systems which does not have the drawbacks of mechanised reasoning systems. The new systems are based on deep neural networks, a recently popular machine learning method that achieved human-level performance on a range of perception tasks such as object detection, speech recognition and natural language processing. The proposed systems are able to learn both diagram to symbol mapping and implicit reasoning rules only from data, with no prior human input about symbols and rules in the reasoning tasks. Specifically I developed EulerNet, a novel neural network model that solves Euler diagram syllogism tasks with 99.5% accuracy. Experiments show that EulerNet learns useful representations of the diagrams and tasks, and is robust to noise and deformation in the input data. I also developed MXGNet, a novel multiplex graph neural architecture that solves Raven Progressive Matrices (RPM) tasks. MXGNet achieves state-of-the-art accuracies on two popular RPM datasets. In addition, I developed Discrete-AIR, an unsupervised learning architecture that learns semi-symbolic representations of diagrams without any labels. Lastly I designed a novel inductive bias module that can be readily used in today’s deep neural networks to improve their generalisation capability on relational reasoning tasks.EPSRC Studentship and Cambridge Trust Scholarshi

Apollo (Cambridge)

Inductive biases for efficient information transfer in artificial networks

Author: Kerg Giancarlo
Publication venue
Publication date: 01/09/2022
Field of study

Malgré des progrès remarquables dans une grande variété de sujets, les réseaux de neurones éprouvent toujours des difficultés à exécuter certaines tâches simples pour lesquelles les humains excellent. Comme indiqué dans des travaux récents, nous émettons l'hypothèse que l'écart qualitatif entre l'apprentissage en profondeur actuel et l'intelligence humaine est le résultat de biais inductifs essentiels manquants. En d'autres termes, en identifiant certains de ces biais inductifs essentiels, nous améliorerons le transfert d'informations dans les réseaux artificiels, ainsi que certaines de leurs limitations actuelles les plus importantes sur un grand ensemble de tâches. Les limites sur lesquelles nous nous concentrerons dans cette thèse sont la généralisation systématique hors distribution et la capacité d'apprendre sur des échelles de temps extrêmement longues. Dans le premier article, nous nous concentrerons sur l'extension des réseaux de neurones récurrents (RNN) à contraintes spectrales et proposerons une nouvelle structure de connectivité basée sur la décomposition de Schur, en conservant les avantages de stabilité et la vitesse d'entraînement des RNN orthogonaux tout en améliorant l'expressivité pour les calculs complexes à court terme par des dynamiques transientes. Cela sert de première étape pour atténuer le problème du "exploding vanishing gradient" (EVGP). Dans le deuxième article, nous nous concentrerons sur les RNN avec une mémoire externe et un mécanisme d'auto-attention comme un moyen alternatif de résoudre le problème du EVGP. Ici, la contribution principale sera une analyse formelle sur la stabilité asymptotique du gradient, et nous identifierons la pertinence d'événements comme un ingrédient clé pour mettre à l'échelle les systèmes d'attention. Nous exploitons ensuite ces résultats théoriques pour fournir un nouveau mécanisme de dépistage de la pertinence, qui permet de concentrer l'auto-attention ainsi que de la mettre à l'échelle, tout en maintenant une bonne propagation du gradient sur de longues séquences. Enfin, dans le troisième article, nous distillons un ensemble minimal de biais inductifs pour les tâches cognitives purement relationnelles et identifions que la séparation des informations relationnelles des entrées sensorielles est un ingrédient inductif clé pour la généralisation OoD sur des entrées invisibles. Nous discutons en outre des extensions aux relations non-vues ainsi que des entrées avec des signaux parasites.Despite remarkable advances in a wide variety of subjects, neural networks are still struggling on simple tasks humans excel at. As outlined in recent work, we hypothesize that the qualitative gap between current deep learning and human-level artificial intelligence is the result of missing essential inductive biases. In other words, by identifying some of these key inductive biases, we will improve information transfer in artificial networks, as well as improve on some of their current most important limitations on a wide range of tasks. The limitations we will focus on in this thesis are out-of-distribution systematic generalization and the ability to learn over extremely long-time scales. In the First Article, we will focus on extending spectrally constrained Recurrent Neural Networks (RNNs), and propose a novel connectivity structure based on the Schur decomposition, retaining the stability advantages and training speed of orthogonal RNNs while enhancing expressivity for short-term complex computations via transient dynamics. This serves as a first step in mitigating the Exploding Vanishing Gradient Problem (EVGP). In the Second Article, we will focus on memory augmented self-attention RNNs as an alternative way to tackling the Exploding Vanishing Gradient Problem (EVGP). Here the main contribution will be a formal analysis on asymptotic gradient stability, and we will identify event relevancy as a key ingredient to scale attention systems. We then leverage these theoretical results to provide a novel relevancy screening mechanism, which makes self-attention sparse and scalable, while maintaining good gradient propagation over long sequences. Finally, in the Third Article, we distill a minimal set of inductive biases for purely relational cognitive tasks, and identify that separating relational information from sensory input is a key inductive ingredient for OoD generalization on unseen inputs. We further discuss extensions to unseen relations as well as settings with spurious features

Dépôt Institutionnel Numérique