54 research outputs found

    Neural-Symbolic Recursive Machine for Systematic Generalization

    Full text link
    Despite the tremendous success, existing machine learning models still fall short of human-like systematic generalization -- learning compositional rules from limited data and applying them to unseen combinations in various domains. We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency. The core representation of NSR is a Grounded Symbol System (GSS) with combinatorial syntax and semantics, which entirely emerges from training data. Akin to the neuroscience studies suggesting separate brain systems for perceptual, syntactic, and semantic processing, NSR implements analogous separate modules of neural perception, syntactic parsing, and semantic reasoning, which are jointly learned by a deduction-abduction algorithm. We prove that NSR is expressive enough to model various sequence-to-sequence tasks. Superior systematic generalization is achieved via the inductive biases of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves state-of-the-art performance in three benchmarks from different domains: SCAN for semantic parsing, PCFG for string manipulation, and HINT for arithmetic reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR demonstrates stronger generalization than pure neural networks due to its symbolic representation and inductive biases. NSR also demonstrates better transferability than existing neural-symbolic approaches due to less domain-specific knowledge required

    Towards Artificial Language Learning in a Potts Attractor Network

    Get PDF
    It remains a mystery how children acquire natural languages; languages far beyond the few symbols that a young chimp struggles to learn, and with complex rules that incomparably surpass the repetitive structure of bird songs. How should one explain the emergence of such a capacity from the basic elements of the nervous system, namely neuronal networks? To understand the brain mechanisms underlying the language phenomenon, specifically sentence construction, different approaches have been attempted to implement an artificial neural network that encodes words and constructs sentences (see e.g. (Hummel, J.E. and Holyoak, 1997; Huyck, 2009; Velde and de Kamps, 2006; Stewart and Eliasmith, 2009)). These attempts differ on how the sentence constituents (parts) are represented\u2014either individually and locally, or in a distributed fashion\u2014and on how these constituents are bound together. In LISA (Hummel, J.E. and Holyoak, 1997), each sentence constituent (either a word, a phrase, or even a proposition) is represented individually by a unit\u2014intended to be a population of neurons (Hummel and Holyoak, 2003)\u2014and relevant constituents synchronously get activated in the construction of a sentence (or the inference of a proposition). Considering the productivity of the language\u2014the ability of humans to create many possible sentences out of a limited vocabulary\u2014this representation results in an exponential growth in the number of units needed for structure representation. In order to avoid this problem, Neural Blackboard Architectures (Velde and de Kamps, 2006) were proposed as systems endowed with dynamic bindings between assemblies of words, roles (e.g. theme or agent), and word categories (e.g. nouns or verbs). A neural blackboard architecture resembles a switchboard (a blackboard) that wires sentence constituents together via circuits, using highly complex and meticulously (unrealistic) organized connections. As opposed to localized approaches, in a Vector Symbolic Architecture (Gayler, 2003; Plate, 1991), words are represented in a fully distributed fashion on a vector. The words are bound (and merged) together by algebraic operations\u2014e.g. tensor products (Smolensky, 1990) or circular convolution (Plate, 1991)\u2014in the vector space. In order to give a biological account, some steps have been attempted towards the neural implementation of such operations (Stewart and Eliasmith, 2009). Another distributed approach was toward implementing a simple recurrent neural network that predicts the next word in a sentence (Elman, 1991). Apart from the limited language size that the network could deal with (Elman, 1993), this system lacked an explicit representation of syntactic constituents, thus resulting in a lack of grammatical knowledge in the network (Borensztajn, 2011; Velde and de Kamps, 2006). However, despite all these attempts, there remains the lack of a neural model that addresses the challenges of language size, semantic and syntactic distinction, word binding, and word implementation in a neurally plausible manner. We are exploring a novel approach to address these challenges, that involves first constructing an artificial language of intermediate complexity and then implementing a neural network, as a simplified cortical model of sentence production, which stores the vocabulary and the grammar of the artificial language in a neurally inspired manner on two components: one semantic and one syntactic. As the training language of the network, we have constructed BLISS (Pirmoradian and Treves, 2011), a scaled-down synthetic language of intermediate complexity, with about 150 words, 40 production rules, and a definition of semantics that is reduced to statistical dependence between words. In Chapter 2, we will explain the details of the implementation of BLISS. As a sentence production model, we have implemented a Potts attractor neural network, whose units hypothetically represent patches of cortex. The choice of the Potts network, for sentence production, has been mainly motivated by the latching dynamics it exhibits (Kropff and Treves, 2006); that is, an ability to spontaneously hop, or latch, across memory patterns, which have been stored as dynamical attractors, thus producing a long or even infinite sequence of patterns, at least in some regimes (Russo and Treves, 2012). The goal is to train the Potts network with a corpus of sentences in BLISS. This involves setting first the structure of the network, then the generating algorithm for word representations, and finally the protocol to train the network with the specific transitions present in the BLISS corpus, using both auto- and hetero-associative learning rules. In Chapter 3, we will explain the details of the procedure we have adapted for word representation in the network. The last step involves utilizing the spontaneous latching dynamics exhibited by the Potts network, the word representation we have developed, and crucially hetero-associative weights favouring specific transitions, to generate, with a suitable associative training procedure, sentences \u201duttered\u201d by the network. This last stage of spontaneous sentence production by the network has been explained in Chapter 4

    The integration of syntax and semantic plausibility in a wide-coverage model of human sentence processing

    Get PDF
    Models of human sentence processing have paid much attention to three key characteristics of the sentence processor: Its robust and accurate processing of unseen input (wide coverage), its immediate, incremental interpretation of partial input and its sensitivity to structural frequencies in previous language experience. In this thesis, we propose a model of human sentence processing that accounts for these three characteristics and also models a fourth key characteristic, namely the influence of semantic plausibility on sentence processing. The precondition for such a sentence processing model is a general model of human plausibility intuitions. We therefore begin by presenting a probabilistic model of the plausibility of verb-argument relations, which we estimate as the probability of encountering a verb-argument pair in the relation specified by a thematic role in a role-annotated training corpus. This model faces a significant sparse data problem, which we alleviate by combining two orthogonal smoothing methods. We show that the smoothed model\u27;s predictions are significantly correlated to human plausibility judgements for a range of test sets. We also demonstrate that our semantic plausibility model outperforms selectional preference models and a standard role labeller, which solve tasks from computational linguistics that are related to the prediction of human judgements. We then integrate this semantic plausibility model with an incremental, wide-coverage, probabilistic model of syntactic processing to form the Syntax/Semantics (SynSem) Integration model of sentence processing. The SynSem-Integration model combines preferences for candidate syntactic structures from two sources: Syntactic probability estimates from a probabilistic parser and our semantic plausibility model\u27;s estimates of the verb-argument relations in each syntactic analysis. The model uses these preferences to determine a globally preferred structure and predicts difficulty in human sentence processing either if syntactic and semantic preferences conflict, or if the interpretation of the preferred analysis changes non-monotonically. In a thorough evaluation against the patterns of processing difficulty found for four ambiguity phenomena in eight reading-time studies, we demonstrate that the SynSem-Integration model reliably predicts human reading time behaviour.Diese Dissertation behandelt die Modellierung des menschlichen Sprachverstehens auf der Ebene einzelner SĂ€tze. WĂ€hrend sich bereits existierende Modelle hauptsĂ€chlich mit syntaktischen Prozessen befassen, liegt unser Schwerpunkt darauf, ein Modell fĂŒr die semantische PlausibilitĂ€t von Äußerungen in ein Satzverarbeitungsmodell zu integrieren. Vier wichtige Eigenschaften des Sprachverstehens bestimmen die Konstruktion unseres Modells: Inkrementelle Verarbeitung, eine erfahrungsbasierte Architektur, breite Abdeckung von Äußerungen, und die Integration von semantischer PlausibilitĂ€t. WĂ€hrend die ersten drei Eigenschaften von vielen Modellen aufgegriffen werden, gab es bis jetzt kein Modell, das außerdem auch PlausibilitĂ€t einbezieht. Wir stellen zunĂ€chst ein generelles PlausibilitĂ€tsmodell vor, um es dann mit einem inkrementellen, probabilistischen Satzverarbeitungsmodell mit breiter Abdeckung zu einem Modell mit allen vier angestrebten Eigenschaften zu integrieren. Unser PlausibilitĂ€tsmodell sagt menschliche PlausibilitĂ€tsbewertungen fĂŒr Verb-Argumentpaare in verschiedenen Relationen (z.B. Agens oder Patiens) voraus. Das Modell estimiert die PlausibilitĂ€t eines Verb-Argumentpaars in einer spezifischen, durch eine thematische Rolle angegebenen Relation als die Wahrscheinlichkeit, das Tripel aus Verb, Argument und Rolle in einem rollensemantisch annotierten Trainingskorpus anzutreffen. Die Vorhersagen des PlausbilitĂ€tsmodells korrelieren fĂŒr eine Reihe verschiedener TestdatensĂ€tze signifikant mit menschlichen PlausibilitĂ€tsbewertungen. Ein Vergleich mit zwei computerlinguist- ischen AnsĂ€tzen, die jeweils eine verwandte Aufgabe erfĂŒllen, nĂ€mlich die Zuweisung von thematischen Rollen und die Berechnung von SelektionsprĂ€ferenzen, zeigt, daß unser Modell PlausibilitĂ€tsurteile verlĂ€ĂŸlicher vorhersagt. Unser Satzverstehensmodell, das Syntax/Semantik-Integrationsmodell, ist eine Kombination aus diesem PlausibilitĂ€tsmodell und einem inkrementellen, probabilistischen Satzverarbeitungsmodell auf der Basis eines syntaktischen Parsers mit breiter Abdeckung. Das Syntax/Semantik-Integrationsmodell interpoliert syntaktische WahrscheinlichkeitsabschĂ€tzungen fĂŒr Analysen einer Äußerung mit den semantischen PlausibilitĂ€tsabschĂ€tzungen fĂŒr die Verb-Argumentpaare in jeder Analyse. Das Ergebnis ist eine global prĂ€ferierte Analyse. Das Syntax/Semantik-Integrationsmodell sagt Verarbeitungsschwierigkeiten voraus, wenn entweder die syntaktisch und semantisch prĂ€ferierte Analyse konfligieren oder wenn sich die semantische Interpretation der global prĂ€ferierten Analyse in einem Verarbeitungsschritt nicht-monoton Ă€ndert. Die abschließende Evaluation anhand von Befunden ĂŒber menschliche Verarbeitungsschwierigkeiten, wie sie experimentell in acht Studien fĂŒr vier AmbiguitĂ€tsphĂ€nomene festgestellt wurden, zeigt, daß das Syntax/Semantik-Integrationsmodell die experimentellen Daten korrekt voraussagt

    Semantic Entropy in Language Comprehension

    Get PDF
    Language is processed on a more or less word-by-word basis, and the processing difficulty induced by each word is affected by our prior linguistic experience as well as our general knowledge about the world. Surprisal and entropy reduction have been independently proposed as linking theories between word processing difficulty and probabilistic language models. Extant models, however, are typically limited to capturing linguistic experience and hence cannot account for the influence of world knowledge. A recent comprehension model by Venhuizen, Crocker, and Brouwer (2019, Discourse Processes) improves upon this situation by instantiating a comprehension-centric metric of surprisal that integrates linguistic experience and world knowledge at the level of interpretation and combines them in determining online expectations. Here, we extend this work by deriving a comprehension-centric metric of entropy reduction from this model. In contrast to previous work, which has found that surprisal and entropy reduction are not easily dissociated, we do find a clear dissociation in our model. While both surprisal and entropy reduction derive from the same cognitive process—the word-by-word updating of the unfolding interpretation—they reflect different aspects of this process: state-by-state expectation (surprisal) versus end-state confirmation (entropy reduction)
