15 research outputs found
Localist representation can improve efficiency for detection and counting
Almost all representations have both distributed and localist aspects, depending upon what properties of the data are being considered. With noisy data, features represented in a localist way can be detected very efficiently, and in binary representations they can be counted more efficiently than those represented in a distributed way. Brains operate in noisy environments, so the localist representation of behaviourally important events is advantageous, and fits what has been found experimentally. Distributed representations require more neurons to perform as efficiently, but they do have greater versatility
ToyArchitecture: Unsupervised Learning of Interpretable Models of the World
Research in Artificial Intelligence (AI) has focused mostly on two extremes:
either on small improvements in narrow AI domains, or on universal theoretical
frameworks which are usually uncomputable, incompatible with theories of
biological intelligence, or lack practical implementations. The goal of this
work is to combine the main advantages of the two: to follow a big picture
view, while providing a particular theory and its implementation. In contrast
with purely theoretical approaches, the resulting architecture should be usable
in realistic settings, but also form the core of a framework containing all the
basic mechanisms, into which it should be easier to integrate additional
required functionality.
In this paper, we present a novel, purposely simple, and interpretable
hierarchical architecture which combines multiple different mechanisms into one
system: unsupervised learning of a model of the world, learning the influence
of one's own actions on the world, model-based reinforcement learning,
hierarchical planning and plan execution, and symbolic/sub-symbolic integration
in general. The learned model is stored in the form of hierarchical
representations with the following properties: 1) they are increasingly more
abstract, but can retain details when needed, and 2) they are easy to
manipulate in their local and symbolic-like form, thus also allowing one to
observe the learning process at each level of abstraction. On all levels of the
system, the representation of the data can be interpreted in both a symbolic
and a sub-symbolic manner. This enables the architecture to learn efficiently
using sub-symbolic methods and to employ symbolic inference.Comment: Revision: changed the pdftitl
Combinatorial Generalisation in Machine Vision
The human capacity for generalisation, i.e. the fact that we are able to successfully perform a familiar task in novel contexts, is one of the hallmarks of our intelligent behaviour. But what mechanisms enable this capacity that is at the same time so impressive but comes so naturally to us? This is a question that has driven copious amounts of research in both Cognitive Science and Artificial Intelligence for almost a century, with some advocating the need for symbolic systems and others the benefits of distributed representations. In this thesis we will explore which principles help AI systems to generalise to novel combinations of previously observed elements (such as color and shape) in the context of machine vision. We will show that while approaches such as disentangled representation learning showed initial promise, they are fundamentally unable to solve this generalisation problem. In doing so we will illustrate the need to perform severe tests of models in order to properly assess their limitations. We will also see how such failures are robust across different datasets, training modalities and in the internal representations of the models. We then show that a different type of system that attempts to learn object-centric representations is capable of solving the generalisation challenges that previous models could not. We conclude by discussing the implications of these results for long-standing questions regarding the kinds of cognitive systems that are required to solve generalisation problems
Holistic processing of hierarchical structures in connectionist networks
Despite the success of connectionist systems to model some aspects of cognition, critics argue that the lack of symbol processing makes them inadequate for modelling
high-level cognitive tasks which require the representation and processing of hierarchical structures. In this thesis we investigate four mechanisms for encoding hierarchical structures in distributed representations that are suitable for processing in
connectionist systems: Tensor Product Representation, Recursive Auto-Associative
Memory (RAAM), Holographic Reduced Representation (HRR), and Binary Spatter
Code (BSC). In these four schemes representations of hierarchical structures are either
learned in a connectionist network or constructed by means of various mathematical
operations from binary or real-value vectors.It is argued that the resulting representations carry structural information without being themselves syntactically structured. The structural information about a represented
object is encoded in the position of its representation in a high-dimensional representational space. We use Principal Component Analysis and constructivist networks to
show that well-separated clusters consisting of representations for structurally similar
hierarchical objects are formed in the representational spaces of RAAMs and HRRs.
The spatial structure of HRRs and RAAM representations supports the holistic yet
structure-sensitive processing of them. Holistic operations on RAAM representations
can be learned by backpropagation networks. However, holistic operators over HRRs,
Tensor Products, and BSCs have to be constructed by hand, which is not a desirable situation. We propose two new algorithms for learning holistic transformations of HRRs
from examples. These algorithms are able to generalise the acquired knowledge to
hierarchical objects of higher complexity than the training examples. Such generalisations exhibit systematicity of a degree which, to our best knowledge, has not yet been
achieved by any other comparable learning method.Finally, we outline how a number of holistic transformations can be learned in parallel and applied to representations of structurally different objects. The ability to distinguish and perform a number of different structure-sensitive operations is one step
towards a connectionist architecture that is capable of modelling complex high-level
cognitive tasks such as natural language processing and logical inference
Statistical language learning
Theoretical arguments based on the "poverty of the stimulus" have denied a
priori the possibility that abstract linguistic representations can be learned
inductively from exposure to the environment, given that the linguistic input
available to the child is both underdetermined and degenerate. I reassess such
learnability arguments by exploring a) the type and amount of statistical
information implicitly available in the input in the form of distributional and
phonological cues; b) psychologically plausible inductive mechanisms for
constraining the search space; c) the nature of linguistic representations,
algebraic or statistical. To do so I use three methodologies: experimental
procedures, linguistic analyses based on large corpora of naturally occurring
speech and text, and computational models implemented in computer
simulations.
In Chapters 1,2, and 5, I argue that long-distance structural dependencies
- traditionally hard to explain with simple distributional analyses based on ngram
statistics - can indeed be learned associatively provided the amount of
intervening material is highly variable or invariant (the Variability effect). In
Chapter 3, I show that simple associative mechanisms instantiated in Simple
Recurrent Networks can replicate the experimental findings under the same
conditions of variability. Chapter 4 presents successes and limits of such results
across perceptual modalities (visual vs. auditory) and perceptual presentation
(temporal vs. sequential), as well as the impact of long and short training
procedures. In Chapter 5, I show that generalisation to abstract categories from
stimuli framed in non-adjacent dependencies is also modulated by the Variability
effect. In Chapter 6, I show that the putative separation of algebraic and
statistical styles of computation based on successful speech segmentation versus
unsuccessful generalisation experiments (as published in a recent Science paper)
is premature and is the effect of a preference for phonological properties of the
input. In chapter 7 computer simulations of learning irregular constructions
suggest that it is possible to learn from positive evidence alone, despite Gold's
celebrated arguments on the unlearnability of natural languages. Evolutionary
simulations in Chapter 8 show that irregularities in natural languages can emerge
from full regularity and remain stable across generations of simulated agents. In
Chapter 9 I conclude that the brain may endowed with a powerful statistical
device for detecting structure, generalising, segmenting speech, and recovering
from overgeneralisations. The experimental and computational evidence gathered
here suggests that statistical language learning is more powerful than heretofore
acknowledged by the current literature
Recommended from our members
Understanding Semantic Implicit Learning through distributional linguistic patterns: A computational perspective
The research presented in this PhD dissertation provides a computational perspective on Semantic Implicit Learning (SIL). It puts forward the idea that SIL does not depend on semantic knowledge as classically conceived but upon semantic-like knowledge gained through distributional analysis of massive linguistic input. Using methods borrowed from the machine learning and artificial intelligence literature, we construct computational models, which can simulate the performance observed during behavioural tasks of semantic implicit learning in a human-like way. We link this methodology to the current literature on implicit learning, arguing that this behaviour is a necessary by-product of efficient language processing.
Chapter 1 introduces the computational problem posed by implicit learning in general, and semantic implicit learning, in particular, as well as the computational framework, used to tackle them.
Chapter 2 introduces distributional semantics models as a way to learn semantic-like representations from exposure to linguistic input.
Chapter 3 reports two studies on large datasets of semantic priming which seek to identify the computational model of semantic knowledge that best fits the data under conditions that resemble SIL tasks. We find that a model which acquires semantic-like knowledge gained through distributional analysis of massive linguistic input provides the best fit to the data.
Chapter 4 generalises the results of the previous two studies by looking at the performance of the same models in languages other than English.
Chapter 5 applies the results of the two previous Chapters on eight datasets of semantic implicit learning. Crucially, these datasets use various semantic manipulations and speakers of different L1s enabling us to test the predictions of different models of semantics.
Chapter 6 examines more closely two assumptions which we have taken for granted throughout this thesis. Firstly, we test whether a simpler model based on phonological information can explain the generalisation patterns observed in the tasks. Secondly, we examine whether our definition of the computational problem in Chapter 5 is reasonable.
Chapter 7 summarises and discusses the implications for implicit language learning and computational models of cognition. Furthermore, we offer one more study that seeks to bridge the literature on distributional models of semantics to `deeper' models of semantics by learning semantic relations.
There are two main contributions of this dissertation to the general field of implicit learning research. Firstly, we highlight the superiority of distributional models of semantics in modelling unconscious semantic knowledge. Secondly, we question whether `deep' semantic knowledge is needed to achieve above chance performance in SIIL tasks. We show how a simple model that learns through distributional analysis of the patterns found in the linguistic input can match the behavioural results in different languages. Furthermore, we link these models to more general problems faced in psycholinguistics such as language processing and learning of semantic relations.Alexandros Onassis Foundatio
Statistical language learning
Theoretical arguments based on the "poverty of the stimulus" have denied a priori the possibility that abstract linguistic representations can be learned inductively from exposure to the environment, given that the linguistic input available to the child is both underdetermined and degenerate. I reassess such learnability arguments by exploring a) the type and amount of statistical information implicitly available in the input in the form of distributional and phonological cues; b) psychologically plausible inductive mechanisms for constraining the search space; c) the nature of linguistic representations, algebraic or statistical. To do so I use three methodologies: experimental procedures, linguistic analyses based on large corpora of naturally occurring speech and text, and computational models implemented in computer simulations. In Chapters 1,2, and 5, I argue that long-distance structural dependencies - traditionally hard to explain with simple distributional analyses based on ngram statistics - can indeed be learned associatively provided the amount of intervening material is highly variable or invariant (the Variability effect). In Chapter 3, I show that simple associative mechanisms instantiated in Simple Recurrent Networks can replicate the experimental findings under the same conditions of variability. Chapter 4 presents successes and limits of such results across perceptual modalities (visual vs. auditory) and perceptual presentation (temporal vs. sequential), as well as the impact of long and short training procedures. In Chapter 5, I show that generalisation to abstract categories from stimuli framed in non-adjacent dependencies is also modulated by the Variability effect. In Chapter 6, I show that the putative separation of algebraic and statistical styles of computation based on successful speech segmentation versus unsuccessful generalisation experiments (as published in a recent Science paper) is premature and is the effect of a preference for phonological properties of the input. In chapter 7 computer simulations of learning irregular constructions suggest that it is possible to learn from positive evidence alone, despite Gold's celebrated arguments on the unlearnability of natural languages. Evolutionary simulations in Chapter 8 show that irregularities in natural languages can emerge from full regularity and remain stable across generations of simulated agents. In Chapter 9 I conclude that the brain may endowed with a powerful statistical device for detecting structure, generalising, segmenting speech, and recovering from overgeneralisations. The experimental and computational evidence gathered here suggests that statistical language learning is more powerful than heretofore acknowledged by the current literature.EThOS - Electronic Theses Online ServiceEuropean Union (EU) (HPRN-CT-1999-00065)GBUnited Kingdo
A Connectionist Defence of the Inscrutability Thesis and the Elimination of the Mental
This work consists of two parts. In Part I (chapters 1-5), I shall produce a Connectionist Defence of Quine's Thesis of the Inscrutability of Reference, according to which there is no objective fact of the matter as to what the ontological commitments of the speakers of a language are. I shall start by reviewing Quine's project in his original behaviouristic setting. Chapters 1, and 2 will be devoted to addressing several criticisms that Gareth Evans, and Crispin Wright, have put forward on behalf of the friend of semantic realism. Evans (1981) and, more recendy, Wright (1997) have argued on different grounds that, under certain conditions, structural simplicity may become alethic-i.e., truth-conducive-for semantic theories. Being structurally more complex than the standard semantic theory, Quine's perverse semantic route (see chapter 1) is an easy prey for Evans' and Wright's considerations. I shall argue that both Evans' and Wright's criticisms are unmotivated, and do not jeopardize Quine's overall enterprise. I shall then propose a perverse theory of reference (chapter 3) which differs substantially from the ones advanced in the previous literature on the issue. The motivation for pursuing a different perverse semantic proposal resides in the fact that the route I shall be offering is as simple, structurally speaking, as our sanctioned theory of reference is meant to be. Thanks to this feature, my strategy is not subject to certain criticisms which may put perverse proposals a la Quine in jeopardy, thereby becoming an overall better candidate for the Quinean to fulfill her goal. In chapter 4, I shall introduce and develop a criterion recently produced by Wright (1997) in terms of 'psychological simplicity' which threatens the perverse semantic proposal I offered in chapter 3. I shall argue that a Language-of-Thought (LOT)-model of human cognition could motivate Wright's criterion. I shall then introduce the reader to some basic aspects of connectionist theory, and elaborate on a particularly promising neurocomputational approach to language processing put forward by Jeff Elman (1992; 1998). I shall argue that if instead of endorsing a LOT hypothesis, we model human cognition by a recurrent neural network a la Elman, then Wright's criterion is unmotivated. In particular, I shall argue that considerations regarding 'psychological simplicity' are neutral, favouring neither a standard theory of reference, nor a perverse one. In the remainder of Part I, I shall focus upon certain problems for the defender of the Inscrutability Thesis highlighted by the friend of connectionist theory. In chapter 5 I shall introduce a mathematical technique for measuring conceptual similarity across networks that Aarre Laakso and Gary Cottrell (1998; 2000) have recently developed. I shall show how Paul Churchland makes use of Laakso and Cottrell's results to argue that connectionism can furnish us with all we need to construct a robust theory of semantics, and a robust theory of translation-robustness that may potentially be exploited by a connectionist foe of Quine to argue against the Inscrutability Thesis. The bulk of the chapter will be devoted to showing that the notion of conceptual similarity available to the connectionist leaves room for a "connectionist Quinean" to kick in with a or\Q-io-many translational mapping across networks. In Part II (chapters 6, and 7), I shall produce a Connectionist Defence of the Thesis of Eliminative Materialism, according to which propositional attitudes don't exist (see chapter 7). I shall start by rejoining to two arguments that Stephen Stich has recently put forward against the thesis of eliminative materialism. In a nutshell, Stich (1990; 1991) argues that (i) the thesis of eliminative materialism, is neither true nor false, and that (ii) even if it were true, that would be philosophically uninteresting. To support (i) and (ii) Stich relies on two premises: (a) that the job of a theory of reference is to make explicit the tacit theory of reference which underlies our intuitions about the notion of reference itself; and (b) that our intuitive notion of reference is a highly idiosyncratic one. In chapter 6 I shall address Stich's anti-eliminativist claims (i) and (ii). I shall argue that even if we agreed with premises (a) and (b), that would lend no support whatsoever for (i) and (ii). Finally, in chapter 7, I shall introduce a connectionist-inspired conditional argument for the elimination of the posits of folk psychology put forward by William Ramsey, Stephen Stich, and Joseph Garon. I shall consider an objection to the eliminativist argument raised by Andy Clark. I shall then review a counter that Stephen Stich and Ted Warfield produce on behalf of the eliminativist. The discussion in chapter 5 on 'state space semantics and conceptual similarity' will be used to show that Clark's argument is not threatened by Stich and Warfield's considerations. Then, in the remainder of Part II, I shall offer a different line of argument to counter to Clark. A line that focuses on the notion of causal efficacy. I hope to show that the thesis of eliminativist materialism is correct. Conclusions, and directions for future research will follow
Revisiting lexical ambiguity effects in visual word recognition
2012 - 2013The aim of this work is to focus on how lexically ambiguous words are represented in the mental lexicon of speakers. The existence of words with multiple meanings/senses (e.g., credenza, mora, etc. in Italian) is a pervasive feature of natural language. Routinely speakers of almost all languages encounter ambiguous words, whose correct interpretation is made by recurring to the linguistic context in which these forms are inserted... [edited by author]XII n.s