54 research outputs found
Neural-Symbolic Recursive Machine for Systematic Generalization
Despite the tremendous success, existing machine learning models still fall
short of human-like systematic generalization -- learning compositional rules
from limited data and applying them to unseen combinations in various domains.
We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency.
The core representation of NSR is a Grounded Symbol System (GSS) with
combinatorial syntax and semantics, which entirely emerges from training data.
Akin to the neuroscience studies suggesting separate brain systems for
perceptual, syntactic, and semantic processing, NSR implements analogous
separate modules of neural perception, syntactic parsing, and semantic
reasoning, which are jointly learned by a deduction-abduction algorithm. We
prove that NSR is expressive enough to model various sequence-to-sequence
tasks. Superior systematic generalization is achieved via the inductive biases
of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves
state-of-the-art performance in three benchmarks from different domains: SCAN
for semantic parsing, PCFG for string manipulation, and HINT for arithmetic
reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and
PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR
demonstrates stronger generalization than pure neural networks due to its
symbolic representation and inductive biases. NSR also demonstrates better
transferability than existing neural-symbolic approaches due to less
domain-specific knowledge required
Towards Artificial Language Learning in a Potts Attractor Network
It remains a mystery how children acquire natural languages; languages far beyond the
few symbols that a young chimp struggles to learn, and with complex rules that incomparably
surpass the repetitive structure of bird songs. How should one explain the emergence of
such a capacity from the basic elements of the nervous system, namely neuronal networks?
To understand the brain mechanisms underlying the language phenomenon, specifically
sentence construction, different approaches have been attempted to implement an artificial
neural network that encodes words and constructs sentences (see e.g. (Hummel, J.E. and
Holyoak, 1997; Huyck, 2009; Velde and de Kamps, 2006; Stewart and Eliasmith, 2009)).
These attempts differ on how the sentence constituents (parts) are represented\u2014either individually
and locally, or in a distributed fashion\u2014and on how these constituents are bound
together.
In LISA (Hummel, J.E. and Holyoak, 1997), each sentence constituent (either a word, a
phrase, or even a proposition) is represented individually by a unit\u2014intended to be a population
of neurons (Hummel and Holyoak, 2003)\u2014and relevant constituents synchronously
get activated in the construction of a sentence (or the inference of a proposition). Considering
the productivity of the language\u2014the ability of humans to create many possible
sentences out of a limited vocabulary\u2014this representation results in an exponential growth in the number of units needed for structure representation.
In order to avoid this problem, Neural Blackboard Architectures (Velde and de Kamps,
2006) were proposed as systems endowed with dynamic bindings between assemblies of
words, roles (e.g. theme or agent), and word categories (e.g. nouns or verbs). A neural
blackboard architecture resembles a switchboard (a blackboard) that wires sentence
constituents together via circuits, using highly complex and meticulously (unrealistic) organized
connections.
As opposed to localized approaches, in a Vector Symbolic Architecture (Gayler, 2003;
Plate, 1991), words are represented in a fully distributed fashion on a vector. The words are
bound (and merged) together by algebraic operations\u2014e.g. tensor products (Smolensky,
1990) or circular convolution (Plate, 1991)\u2014in the vector space. In order to give a biological
account, some steps have been attempted towards the neural implementation of such
operations (Stewart and Eliasmith, 2009).
Another distributed approach was toward implementing a simple recurrent neural network
that predicts the next word in a sentence (Elman, 1991). Apart from the limited language
size that the network could deal with (Elman, 1993), this system lacked an explicit
representation of syntactic constituents, thus resulting in a lack of grammatical knowledge
in the network (Borensztajn, 2011; Velde and de Kamps, 2006).
However, despite all these attempts, there remains the lack of a neural model that addresses
the challenges of language size, semantic and syntactic distinction, word binding,
and word implementation in a neurally plausible manner.
We are exploring a novel approach to address these challenges, that involves first constructing
an artificial language of intermediate complexity and then implementing a neural
network, as a simplified cortical model of sentence production, which stores the vocabulary
and the grammar of the artificial language in a neurally inspired manner on two
components: one semantic and one syntactic.
As the training language of the network, we have constructed BLISS (Pirmoradian and
Treves, 2011), a scaled-down synthetic language of intermediate complexity, with about
150 words, 40 production rules, and a definition of semantics that is reduced to statistical
dependence between words. In Chapter 2, we will explain the details of the implementation of BLISS.
As a sentence production model, we have implemented a Potts attractor neural network,
whose units hypothetically represent patches of cortex. The choice of the Potts network,
for sentence production, has been mainly motivated by the latching dynamics it exhibits
(Kropff and Treves, 2006); that is, an ability to spontaneously hop, or latch, across memory
patterns, which have been stored as dynamical attractors, thus producing a long or even
infinite sequence of patterns, at least in some regimes (Russo and Treves, 2012). The goal
is to train the Potts network with a corpus of sentences in BLISS. This involves setting first
the structure of the network, then the generating algorithm for word representations, and
finally the protocol to train the network with the specific transitions present in the BLISS
corpus, using both auto- and hetero-associative learning rules. In Chapter 3, we will explain
the details of the procedure we have adapted for word representation in the network.
The last step involves utilizing the spontaneous latching dynamics exhibited by the
Potts network, the word representation we have developed, and crucially hetero-associative
weights favouring specific transitions, to generate, with a suitable associative training procedure,
sentences \u201duttered\u201d by the network. This last stage of spontaneous sentence production
by the network has been explained in Chapter 4
The integration of syntax and semantic plausibility in a wide-coverage model of human sentence processing
Models of human sentence processing have paid much attention to three key characteristics of the sentence processor: Its robust and accurate processing of unseen input (wide coverage), its immediate, incremental interpretation of partial input and its sensitivity to structural frequencies in previous language experience. In this thesis, we propose a model of human sentence processing that accounts for these three characteristics and also models a fourth key characteristic, namely the influence of semantic plausibility on sentence processing.
The precondition for such a sentence processing model is a general model of human plausibility intuitions. We therefore begin by presenting a probabilistic model of the plausibility of verb-argument relations, which we estimate as the probability of encountering a verb-argument pair in the relation specified by a thematic role in a role-annotated training corpus. This model faces a significant sparse data problem, which we alleviate by combining two orthogonal smoothing methods. We show that the smoothed model\u27;s predictions are significantly correlated to human plausibility judgements for a range of test sets. We also demonstrate that our semantic plausibility model outperforms selectional preference models and a standard role labeller, which solve tasks from computational linguistics that are related to the prediction of human judgements.
We then integrate this semantic plausibility model with an incremental, wide-coverage, probabilistic model of syntactic processing to form the Syntax/Semantics (SynSem) Integration model of sentence processing. The SynSem-Integration model combines preferences for candidate syntactic structures from two sources: Syntactic probability estimates from a probabilistic parser and our semantic plausibility model\u27;s estimates of the verb-argument relations in each syntactic analysis. The model uses these preferences to determine a globally preferred structure and predicts difficulty in human sentence processing either if syntactic and semantic preferences conflict, or if the interpretation of the preferred analysis changes non-monotonically. In a thorough evaluation against the patterns of processing difficulty found for four ambiguity phenomena in eight reading-time studies, we demonstrate that the SynSem-Integration model reliably predicts human reading time behaviour.Diese Dissertation behandelt die Modellierung des menschlichen Sprachverstehens auf der Ebene einzelner SĂ€tze. WĂ€hrend sich bereits existierende Modelle hauptsĂ€chlich mit syntaktischen Prozessen befassen, liegt unser Schwerpunkt darauf, ein Modell fĂŒr die semantische PlausibilitĂ€t von ĂuĂerungen in ein Satzverarbeitungsmodell zu integrieren. Vier wichtige Eigenschaften des Sprachverstehens bestimmen die Konstruktion unseres Modells: Inkrementelle Verarbeitung, eine erfahrungsbasierte Architektur, breite Abdeckung von ĂuĂerungen, und die Integration von semantischer PlausibilitĂ€t. WĂ€hrend die ersten drei Eigenschaften von vielen Modellen aufgegriffen werden, gab es bis jetzt kein Modell, das auĂerdem auch PlausibilitĂ€t einbezieht.
Wir stellen zunĂ€chst ein generelles PlausibilitĂ€tsmodell vor, um es dann mit einem inkrementellen, probabilistischen Satzverarbeitungsmodell mit breiter Abdeckung zu einem Modell mit allen vier angestrebten Eigenschaften zu integrieren. Unser PlausibilitĂ€tsmodell sagt menschliche PlausibilitĂ€tsbewertungen fĂŒr Verb-Argumentpaare in verschiedenen Relationen (z.B. Agens oder Patiens) voraus. Das Modell estimiert die PlausibilitĂ€t eines Verb-Argumentpaars in einer spezifischen, durch eine thematische Rolle angegebenen Relation als die Wahrscheinlichkeit, das Tripel aus Verb, Argument und Rolle in einem rollensemantisch annotierten Trainingskorpus anzutreffen. Die Vorhersagen des PlausbilitĂ€tsmodells korrelieren fĂŒr eine Reihe verschiedener TestdatensĂ€tze signifikant mit menschlichen PlausibilitĂ€tsbewertungen. Ein Vergleich mit zwei computerlinguist- ischen AnsĂ€tzen, die jeweils eine verwandte Aufgabe erfĂŒllen, nĂ€mlich die Zuweisung von thematischen Rollen und die Berechnung von SelektionsprĂ€ferenzen, zeigt, daĂ unser Modell PlausibilitĂ€tsurteile verlĂ€Ălicher vorhersagt.
Unser Satzverstehensmodell, das Syntax/Semantik-Integrationsmodell, ist eine Kombination aus diesem PlausibilitĂ€tsmodell und einem inkrementellen, probabilistischen Satzverarbeitungsmodell auf der Basis eines syntaktischen Parsers mit breiter Abdeckung. Das Syntax/Semantik-Integrationsmodell interpoliert syntaktische WahrscheinlichkeitsabschĂ€tzungen fĂŒr Analysen einer ĂuĂerung mit den semantischen PlausibilitĂ€tsabschĂ€tzungen fĂŒr die Verb-Argumentpaare in jeder Analyse. Das Ergebnis ist eine global prĂ€ferierte Analyse. Das Syntax/Semantik-Integrationsmodell sagt Verarbeitungsschwierigkeiten voraus, wenn entweder die syntaktisch und semantisch prĂ€ferierte Analyse konfligieren oder wenn sich die semantische Interpretation der global prĂ€ferierten Analyse in einem Verarbeitungsschritt nicht-monoton Ă€ndert. Die abschlieĂende Evaluation anhand von Befunden ĂŒber menschliche Verarbeitungsschwierigkeiten, wie sie experimentell in acht Studien fĂŒr vier AmbiguitĂ€tsphĂ€nomene festgestellt wurden, zeigt, daĂ das Syntax/Semantik-Integrationsmodell die experimentellen Daten korrekt voraussagt
Semantic Entropy in Language Comprehension
Language is processed on a more or less word-by-word basis, and the processing difficulty
induced by each word is affected by our prior linguistic experience as well as our general knowledge
about the world. Surprisal and entropy reduction have been independently proposed as linking
theories between word processing difficulty and probabilistic language models. Extant models, however,
are typically limited to capturing linguistic experience and hence cannot account for the influence of
world knowledge. A recent comprehension model by Venhuizen, Crocker, and Brouwer (2019, Discourse
Processes) improves upon this situation by instantiating a comprehension-centric metric of surprisal that
integrates linguistic experience and world knowledge at the level of interpretation and combines them in
determining online expectations. Here, we extend this work by deriving a comprehension-centric metric
of entropy reduction from this model. In contrast to previous work, which has found that surprisal and
entropy reduction are not easily dissociated, we do find a clear dissociation in our model. While both
surprisal and entropy reduction derive from the same cognitive processâthe word-by-word updating
of the unfolding interpretationâthey reflect different aspects of this process: state-by-state expectation
(surprisal) versus end-state confirmation (entropy reduction)
- âŠ