33 research outputs found

    Connectionist natural language parsing

    Get PDF
    The key developments of two decades of connectionist parsing are reviewed. Connectionist parsers are assessed according to their ability to learn to represent syntactic structures from examples automatically, without being presented with symbolic grammar rules. This review also considers the extent to which connectionist parsers offer computational models of human sentence processing and provide plausible accounts of psycholinguistic data. In considering these issues, special attention is paid to the level of realism, the nature of the modularity, and the type of processing that is to be found in a wide range of parsers

    Empirical Lessons for Philosophical Theories of Mental Content

    Get PDF
    This thesis concerns the content of mental representations. It draws lessons for philosophical theories of content from some empirical findings about brains and behaviour drawn from experimental psychology (cognitive, developmental, comparative), cognitive neuroscience and cognitive science (computational modelling). Chapter 1 motivates a naturalist and realist approach to mental representation. Chapter 2 sets out and defends a theory of content for static feedforward connectionist networks, and explains how the theory can be extended to other supervised networks. The theory takes forward Churchland’s state space semantics by making a new and clearer proposal about the syntax of connectionist networks − one which nicely accounts for representational development. Chapter 3 argues that the same theoretical approach can be extended to unsupervised connectionist networks, and to some of the representational systems found in real brains. The approach can also show why connectionist systems sometimes show typicality effects, explaining them without relying upon prototype structure. That is discussed in chapter 4, which also argues that prototype structure, where it does exist, does not determine content. The thesis goes on to defend some unorthodox features of the foregoing theory: that a role is assigned to external samples in specifying syntax, that both inputs to and outputs from the system have a role in determining content, and that the content of a representation is partly determined by the circumstances in which it developed. Each, it is argued, may also be a fruitful way of thinking about mental content more generally. Reliance on developmental factors prompts a swampman-type objection. This is rebutted by reference to three possible reasons why content is attributed at all. Two of these motivations support the idea that content is partly determined by historical factors, and the third is consistent with it. The result: some empirical lessons for philosophical theories of mental content.Philosophy of Min

    UvA-DARE (Digital Academic Repository) Generalization and Systematicity in Echo State Networks

    Get PDF
    Abstract Echo state networks (ESNs) are recurrent neural networks that can be trained efficiently because the weights of recurrent connections remain fixed at random values. Investigations of these networks' ability to generalize in sentence-processing tasks have resulted in mixed outcomes. Here, we argue that ESNs do generalize but that they are not systematic, which we define as the ability to generally outperform Markov models on test sentences that violate the training sentences' grammar. Moreover, we show that systematicity in ESNs can easily be obtained by switching from arbitrary to informative representations of words, suggesting that the information provided by such representations facilitates connectionist systematicity

    Connectionist language production : distributed representations and the uniform information density hypothesis

    Get PDF
    This dissertation approaches the task of modeling human sentence production from a connectionist point of view, and using distributed semantic representations. The main questions it tries to address are: (i) whether the distributed semantic representations defined by Frank et al. (2009) are suitable to model sentence production using artificial neural networks, (ii) the behavior and internal mechanism of a model that uses this representations and recurrent neural networks, and (iii) a mechanistic account of the Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). Regarding the first point, the semantic representations of Frank et al. (2009), called situation vectors are points in a vector space where each vector contains information about the observations in which an event and a corresponding sentence are true. These representations have been successfully used to model language comprehension (e.g., Frank et al., 2009; Venhuizen et al., 2018). During the construction of these vectors, however, a dimensionality reduction process introduces some loss of information, which causes some aspects to be no longer recognizable, reducing the performance of a model that utilizes them. In order to address this issue, belief vectors are introduced, which could be regarded as an alternative way to obtain semantic representations of manageable dimensionality. These two types of representations (situation and belief vectors) are evaluated using them as input for a sentence production model that implements an extension of a Simple Recurrent Neural network (Elman, 1990). This model was tested under different conditions corresponding to different levels of systematicity, which is the ability of a model to generalize from a set of known items to a set of novel ones. Systematicity is an essential attribute that a model of sentence processing has to possess, considering that the number of sentences that can be generated for a given language is infinite, and therefore it is not feasible to memorize all possible message-sentence pairs. The results showed that the model was able to generalize with a very high performance in all test conditions, demonstrating a systematic behavior. Furthermore, the errors that it elicited were related to very similar semantic representations, reflecting the speech error literature, which states that speech errors involve elements with semantic or phonological similarity. This result further demonstrates the systematic behavior of the model, as it processes similar semantic representations in a similar way, even if they are new to the model. Regarding the second point, the sentence production model was analyzed in two different ways. First, by looking at the sentences it produces, including the errors elicited, highlighting difficulties and preferences of the model. The results revealed that the model learns the syntactic patterns of the language, reflecting its statistical nature, and that its main difficulty is related to very similar semantic representations, sometimes producing unintended sentences that are however very semantically related to the intended ones. Second, the connection weights and activation patterns of the model were also analyzed, reaching an algorithmic account of the internal processing of the model. According to this, the input semantic representation activates the words that are related to its content, giving an idea of their order by providing relatively more activation to words that are likely to appear early in the sentence. Then, at each time step the word that was previously produced activates syntactic and semantic constraints on the next word productions, while the context units of the recurrence preserve information through time, allowing the model to enforce long distance dependencies. We propose that these results can inform about the internal processing of models with similar architecture. Regarding the third point, an extension of the model is proposed with the goal of modeling UID. According to UID, language production is an efficient process affected by a tendency to produce linguistic units distributing the information as uniformly as possible and close to the capacity of the communication channel, given the encoding possibilities of the language, thus optimizing the amount of information that is transmitted per time unit. This extension of the model approaches UID by balancing two different production strategies: one where the model produces the word with highest probability given the semantics and the previously produced words, and another one where the model produces the word that would minimize the sentence length given the semantic representation and the previously produced words. By combining these two strategies, the model was able to produce sentences with different levels of information density and uniformity, providing a first step to model UID at the algorithmic level of analysis. In sum, the results show that the distributed semantic representations of Frank et al. (2009) can be used to model sentence production, exhibiting systematicity. Moreover, an algorithmic account of the internal behavior of the model was reached, with the potential to generalize to other models with similar architecture. Finally, a model of UID is presented, highlighting some important aspects about UID that need to be addressed in order to go from the formulation of UID at the computational level of analysis to a mechanistic account at the algorithmic level.Diese Dissertation widmet sich der Aufgabe, die menschliche Satzproduktion aus konnektionistischer Sicht zu modellieren und dabei verteilte semantische ReprĂ€sentationen zu verwenden. Die Schwerpunkte werden dabei sein: (i) die Frage, ob die von Frank et al. (2009) definierten verteilten semantischen ReprĂ€sentationen geeignet sind, um die Satzproduktion unter Verwendung kĂŒnstlicher neuronaler Netze zu modellieren; (ii) das Verhalten und der interne Mechanismus eines Modells, das diese ReprĂ€sentationen und rekurrente neuronale Netze verwendet; (iii) eine mechanistische Darstellung der Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). ZunĂ€chst sei angenommen, dass die ReprĂ€sentationen von Frank et al. (2009), genannt Situation Vektoren, Punkte in einem Vektorraum sind, in dem jeder Vektor Informationen ĂŒber Beobachtungen enthĂ€lt, in denen ein Ereignis und ein entsprechender Satz wahr sind. Diese ReprĂ€sentationen wurden erfolgreich verwendet, um SprachverstĂ€ndnis zu modellieren (z.B. Frank et al., 2009; Venhuizen et al., 2018). WĂ€hrend der Konstruktion dieser Vektoren fĂŒhrt ein Prozess der Dimensionsreduktion jedoch zu einem gewissen Informationsverlust, wodurch einige Aspekte verloren gehen. Um das Problem zu lösen, werden als Alternative Belief Vektoren eingefĂŒhrt. Diese beiden Arten der ReprĂ€sentation werden ausgewertet, indem sie als Eingabe fĂŒr ein Satzproduktionsmodell verwendet werden, welches als Erweiterung eines Simple Recurrent Neural Network (SRN, Elman, 1990) implementiert wurden. Dieses Modell wird unter verschiedenen Bedingungen getestet, die verschiedenen Ebenen der SystematizitĂ€t entsprechen, d.h. der FĂ€higkeit eines Modells, von einer Menge bekannter Elemente auf eine Menge neuer Elemente zu verallgemeinern. SystematizitĂ€t ist ein wesentliches Attribut, das ein Modell der Satzverarbeitung besitzen muss, wenn man bedenkt, dass die Anzahl der SĂ€tze, die in einer bestimmte Sprache erzeugt werden können, unendlich ist und es daher nicht möglich ist, sich alle möglichen Nachrichten-Satz-Paare zu merken. Die Ergebnisse zeigen, dass das Modell in der Lage ist, unter allen Testbedingungen erfolgreich zu generalisieren und dabei ein systematisches Verhalten zeigt. DarĂŒber hinaus weisen die verbleibenden Fehler starke Ähnlichkeit zu anderen semantischen ReprĂ€sentationen auf. Dies findet sich in der Literatur zu Sprachfehlern wider, wo es heißt, dass Fehler Elemente semantischer oder phonologischer Ähnlichkeit beinhalten. Dieses Ergebnis beweist das v systematische Verhalten des Modells, da es Ă€hnliche semantische ReprĂ€sentationen in Ă€hnlicher Weise verarbeitet, auch wenn sie dem Modell unbekannt sind. Zweitens wurde das Satzproduktionsmodell auf zwei verschiedene Arten analysiert. (i) Indem man sich die von ihm erzeugten SĂ€tze ansieht, einschließlich der aufgetretenen Fehler, und dabei die Schwierigkeiten und PrĂ€ferenzen des Modells hervorhebt. Die Ergebnisse zeigen, dass das Modell die syntaktischen Muster der Sprache lernt. DarĂŒber hinaus zeigt sich, dass die verbleibenden Probleme im Wesentlichen mit sehr Ă€hnlichen semantischen ReprĂ€sentationen zusammenhĂ€ngen, die manchmal ungewollte SĂ€tze produzieren, welche jedoch semantisch nah an den beabsichtigten SĂ€tzen liegen. (ii) Indem die Verbindungsgewichte und Aktivierungsmuster des Modells analysiert und eine algorithmische Darstellung der internen Verarbeitung erzielt wird. Demnach aktiviert die semantische EingangsreprĂ€sentation jene Wörter, mit denen sie inhaltlich zusammenhĂ€ngt. In diesem Zusammenhang wird ein Ranking erzeugt, weil Wörter, die wahrscheinlich frĂŒh im Satz erscheinen eine stĂ€rkere Aktivierung erfahren. Im nĂ€chsten Schritt aktiviert das zuvor produzierte Wort syntaktische und semantische EinschrĂ€nkungen der nĂ€chsten Wortproduktionen. Derweil speichern Kontext-Einheiten Informationen fĂŒr einen lĂ€ngeren Zeitraum, und ermöglichen es dem Modell so, lĂ€ngere AbhĂ€ngigkeiten zu realisieren. Nach unserem VerstĂ€ndnis können diese Erkenntnisse als ErklĂ€rungsgrundlage fĂŒr andere, verwandte Modelle herangezogen werden. Drittens wird eine Erweiterung des Modells vorgeschlagen, um die UID nachzubilden. Laut UID ist die Sprachproduktion ein effizienter Prozess, der von der Tendenz geprĂ€gt ist, linguistische Einheiten zu produzieren, die Informationen so einheitlich wie möglich verteilen, und dabei die KapazitĂ€t des Kommunikationskanals vor dem Hintergrund der sprachlichen Kodierungsmöglichkeiten ausreizt, wodurch die Menge der pro Zeiteinheit ĂŒbertragenen Informationen maximiert wird. Dies wird in der Erweiterung umgesetzt, indem zwei verschiedene Strategien der Wortproduktion gegeneinander ausgespielt werden: WĂ€hle das Wort (i) mit der höchsten Wahrscheinlichkeit unter den zuvor produzierten Wörtern; oder (ii) welches die SatzlĂ€nge minimiert. Durch die Kombination dieser beiden Strategien ist das Modell in der Lage, SĂ€tze unter Vorgabe der Informationsdichte und -verteilung zu erzeugen, was einer ersten Modellierung der UID auf algorithmischer Ebene gleichkommt. Zusammenfassend zeigen die Resultate, dass die verteilten semantischen ReprĂ€sentationen von Frank et al. (2009) fĂŒr die Satzproduktion verwendet werden können und dabei SystematizitĂ€t beobachtet werden kann. DarĂŒber hinaus wird eine algorithmische ErklĂ€rung der internen Mechanismen des Modells geliefert. Schließlich wird ein Modell der UID vorgestellt, das einen ersten Schritt zu einer mechanistischen Darstellung auf der algorithmischen Ebene der Analyse darstellt

    Compositional Linguistic Generalization in Artificial Neural Networks

    Get PDF
    Compositionality---the principle that the meaning of a complex expression is built from the meanings of its parts---is considered a central property of human language. This dissertation focuses on compositional generalization, a key benefit of compositionality that enables the production and comprehension of novel expressions. Specifically, this dissertation develops a test for compositional generalization for sequence-to-sequence artificial neural networks (ANNs). Before doing so, I start by developing a test for grammatical category abstraction: an important precondition to compositional generalization, because category membership determines the applicability of compositional rules. Then, I construct a test for compositional generalization based on human generalization patterns discussed in existing linguistic and developmental studies. The test takes the form of semantic parsing (translation from natural language expressions to semantic representations) where the training and generalization sets have systematic gaps that can be filled by composing known parts. The generalization cases fall into two broad categories: lexical and structural, depending on whether generalization to novel combinations of known lexical items and known structures is required, or generalization to novel structures is required. The ANNs evaluated on this test exhibit limited degrees of compositional generalization, implying that the inductive biases of the ANNs and human learners differ substantially. An error analysis reveals that all ANNs tested frequently make generalizations that violate faithfulness constraints (e.g., Emma saw Lina ↝ see'(Emma', Audrey') instead of see'(Emma', Lina')). Adding a glossing task (word-by-word translation)---a task that requires maximally faithful input-output mappings---as an auxiliary objective to the Transformer model (Vaswani et al. 2017) greatly improves generalization, demonstrating that a faithfulness bias can be injected through the auxiliary training approach. However, the improvement is limited to lexical generalization; all models struggle with assigning appropriate semantic representations to novel structures regardless of auxiliary training. This difficulty of structural generalization leaves open questions for both ANN and human learners. I discuss promising directions for improving structural generalization in ANNs, and furthermore propose an artificial language learning study for human subjects analogous to the tests posed to ANNs, which will lead to more detailed characterization of the patterns of structural generalization in human learners

    Linguistic Competence and New Empiricism in Philosophy and Science

    Get PDF
    The topic of this dissertation is the nature of linguistic competence, the capacity to understand and produce sentences of natural language. I defend the empiricist account of linguistic competence embedded in the connectionist cognitive science. This strand of cognitive science has been opposed to the traditional symbolic cognitive science, coupled with transformational-generative grammar, which was committed to nativism due to the view that human cognition, including language capacity, should be construed in terms of symbolic representations and hardwired rules. Similarly, linguistic competence in this framework was regarded as being innate, rule-governed, domain-specific, and fundamentally different from performance, i.e., idiosyncrasies and factors governing linguistic behavior. I analyze state-of-the-art connectionist, deep learning models of natural language processing, most notably large language models, to see what they can tell us about linguistic competence. Deep learning is a statistical technique for the classification of patterns through which artificial intelligence researchers train artificial neural networks containing multiple layers that crunch a gargantuan amount of textual and/or visual data. I argue that these models suggest that linguistic competence should be construed as stochastic, pattern-based, and stemming from domain-general mechanisms. Moreover, I distinguish syntactic from semantic competence, and I show for each the ramifications of the endorsement of a connectionist research program as opposed to the traditional symbolic cognitive science and transformational-generative grammar. I provide a unifying front, consisting of usage-based theories, a construction grammar approach, and an embodied approach to cognition to show that the more multimodal and diverse models are in terms of architectural features and training data, the stronger the case is for the connectionist linguistic competence. I also propose to discard the competence vs. performance distinction as theoretically inferior so that a novel and integrative account of linguistic competence originating in connectionism and empiricism that I propose and defend in the dissertation could be put forward in scientific and philosophical literature

    Integrative (Synchronisations-)Mechanismen der (Neuro-)Kognition vor dem Hintergrund des (Neo-)Konnektionismus, der Theorie der nichtlinearen dynamischen Systeme, der Informationstheorie und des Selbstorganisationsparadigmas

    Get PDF
    Der Gegenstand der vorliegenden Arbeit besteht darin, aufbauend auf dem (Haupt-)Thema, der Darlegung und Untersuchung der Lösung des Bindungsproblems anhand von temporalen integrativen (Synchronisations-)Mechanismen im Rahmen der kognitiven (Neuro-)Architekturen im (Neo-)Konnektionismus mit Bezug auf die Wahrnehmungs- und Sprachkognition, vor allem mit Bezug auf die dabei auftretende KompositionalitĂ€ts- und SystematizitĂ€tsproblematik, die Konstruktion einer noch zu entwickelnden integrativen Theorie der (Neuro-)Kognition zu skizzie-ren, auf der Basis des ReprĂ€sentationsformats einer sog. „vektoriellen Form“, u.z. vor dem Hintergrund des (Neo-)Konnektionismus, der Theorie der nichtlinearen dynamischen Systeme, der Informationstheorie und des Selbstorganisations-Paradigmas

    The Boltzmann Machine: a Connectionist Model for Supra-Classical Logic

    Get PDF
    This thesis moves towards reconciliation of two of the major paradigms of artificial intelligence: by exploring the representation of symbolic logic in an artificial neural network. Previous attempts at the machine representation of classical logic are reviewed. We however, consider the requirements of inference in the broader realm of supra-classical, non-monotonic logic. This logic is concerned with the tolerance of exceptions, thought to be associated with common-sense reasoning. Biological plausibility extends these requirements in the context of human cognition. The thesis identifies the requirements of supra-classical, non-monotonic logic in relation to the properties of candidate neural networks. Previous research has theoretically identified the Boltzmann machine as a potential candidate. We provide experimental evidence supporting a version of the Boltzmann machine as a practical representation of this logic. The theme is pursued by looking at the benefits of utilising the relationship between the logic and the Boltzmann machine in two areas. We report adaptations to the machine architecture which select for different information distributions. These distributions correspond to state preference in traditional logic versus the concept of atomic typicality in contemporary approaches to logic. We also show that the learning algorithm of the Boltzmann machine can be adapted to implement pseudo-rehearsal during retraining. The results of machine retraining are then utilised to consider the plausibility of some current theories of belief revision in logic. Furthermore, we propose an alternative approach to belief revision based on the experimental results of retraining the Boltzmann machine
    corecore