579 research outputs found
A Defense of Pure Connectionism
Connectionism is an approach to neural-networks-based cognitive modeling that encompasses the recent deep learning movement in artificial intelligence. It came of age in the 1980s, with its roots in cybernetics and earlier attempts to model the brain as a system of simple parallel processors. Connectionist models center on statistical inference within neural networks with empirically learnable parameters, which can be represented as graphical models. More recent approaches focus on learning and inference within hierarchical generative models. Contra influential and ongoing critiques, I argue in this dissertation that the connectionist approach to cognitive science possesses in principle (and, as is becoming increasingly clear, in practice) the resources to model even the most rich and distinctly human cognitive capacities, such as abstract, conceptual thought and natural language comprehension and production.
Consonant with much previous philosophical work on connectionism, I argue that a core principle—that proximal representations in a vector space have similar semantic values—is the key to a successful connectionist account of the systematicity and productivity of thought, language, and other core cognitive phenomena. My work here differs from preceding work in philosophy in several respects: (1) I compare a wide variety of connectionist responses to the systematicity challenge and isolate two main strands that are both historically important and reflected in ongoing work today: (a) vector symbolic architectures and (b) (compositional) vector space semantic models; (2) I consider very recent applications of these approaches, including their deployment on large-scale machine learning tasks such as machine translation; (3) I argue, again on the basis mostly of recent developments, for a continuity in representation and processing across natural language, image processing and other domains; (4) I explicitly link broad, abstract features of connectionist representation to recent proposals in cognitive science similar in spirit, such as hierarchical Bayesian and free energy minimization approaches, and offer a single rebuttal of criticisms of these related paradigms; (5) I critique recent alternative proposals that argue for a hybrid Classical (i.e. serial symbolic)/statistical model of mind; (6) I argue that defending the most plausible form of a connectionist cognitive architecture requires rethinking certain distinctions that have figured prominently in the history of the philosophy of mind and language, such as that between word- and phrase-level semantic content, and between inference and association
Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English
We conduct in this work an evaluation study comparing offline and online
neural machine translation architectures. Two sequence-to-sequence models:
convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based
Transformer (Vaswani et al. 2017) are considered. We investigate, for both
architectures, the impact of online decoding constraints on the translation
quality through a carefully designed human evaluation on English-German and
German-English language pairs, the latter being particularly sensitive to
latency constraints. The evaluation results allow us to identify the strengths
and shortcomings of each model when we shift to the online setup.Comment: Accepted at COLING 202
Efficient Wait-k Models for Simultaneous Machine Translation
Simultaneous machine translation consists in starting output generation
before the entire input sequence is available. Wait-k decoders offer a simple
but efficient approach for this problem. They first read k source tokens, after
which they alternate between producing a target token and reading another
source token. We investigate the behavior of wait-k decoding in low resource
settings for spoken corpora using IWSLT datasets. We improve training of these
models using unidirectional encoders, and training across multiple values of k.
Experiments with Transformer and 2D-convolutional architectures show that our
wait-k models generalize well across a wide range of latency levels. We also
show that the 2D-convolution architecture is competitive with Transformers for
simultaneous translation of spoken language.Comment: Accepted at INTERSPEECH 202
INCREMENTAL PREDICTION OF SENTENCE-FINAL VERBS WITH ATTENTIVE RECURRENT NEURAL NETWORKS
Sentence-final verb prediction has garnered attention both in computational lin-
guistics and psycholinguistics. It is indispensable for understanding human processing
of verb-final languages. More recently, it has been used for computational approaches to
simultaneous interpretation, i.e. translation in real-time, from verb-final to verb-medial
languages. While previous approaches use classical statistical methods including pattern-
matching rules, n-gram language models, or a logistic regression with linguistic features,
we introduce an attention-based neural model, Attentive Neural Verb Inference for Incre-
mental Language (ANVIIL), to incrementally predict final verbs on incomplete sentences.
Our approach better predicts the final verbs in Japanese and German and provides more
interpretable explanations of why those verbs are selected
Layer-wise Representation Fusion for Compositional Generalization
Despite successes across a broad range of applications, sequence-to-sequence
models' construct of solutions are argued to be less compositional than
human-like generalization. There is mounting evidence that one of the reasons
hindering compositional generalization is representations of the encoder and
decoder uppermost layer are entangled. In other words, the syntactic and
semantic representations of sequences are twisted inappropriately. However,
most previous studies mainly concentrate on enhancing token-level semantic
information to alleviate the representations entanglement problem, rather than
composing and using the syntactic and semantic representations of sequences
appropriately as humans do. In addition, we explain why the entanglement
problem exists from the perspective of recent studies about training deeper
Transformer, mainly owing to the ``shallow'' residual connections and its
simple, one-step operations, which fails to fuse previous layers' information
effectively. Starting from this finding and inspired by humans' strategies, we
propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and
Semant\textbf{i}c Representati\textbf{on}s), an extension to
sequence-to-sequence models to learn to fuse previous layers' information back
into the encoding and decoding process appropriately through introducing a
\emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion}
achieves competitive and even \textbf{state-of-the-art} results on two
realistic benchmarks, which empirically demonstrates the effectiveness of our
proposal.Comment: work in progress. arXiv admin note: substantial text overlap with
arXiv:2305.1216
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
- …