48 research outputs found
A Neurocomputational Model of Grounded Language Comprehension and Production at the Sentence Level
While symbolic and statistical approaches to natural language processing have become undeniably impressive in recent years, such systems still display a tendency to make errors that are inscrutable to human onlookers. This disconnect with human processing may stem from the vast differences in the substrates that underly natural language processing in artificial systems versus biological systems.
To create a more relatable system, this dissertation turns to the more biologically inspired substrate of neural networks, describing the design and implementation of a model that learns to comprehend and produce language at the sentence level. The model's task is to ground simulated speech streams, representing a simple subset of English, in terms of a virtual environment. The model learns to understand and answer full-sentence questions about the environment by mimicking the speech stream of another speaker, much as a human language learner would. It is the only known neural model to date that can learn to map natural language questions to full-sentence natural language answers, where both question and answer are represented sublexically as phoneme sequences.
The model addresses important points for which most other models, neural and otherwise, fail to account. First, the model learns to ground its linguistic knowledge using human-like sensory representations, gaining language understanding at a deeper level than that of syntactic structure. Second, analysis provides evidence that the model learns combinatorial internal representations, thus gaining the compositionality of symbolic approaches to cognition, which is vital for computationally efficient encoding and decoding of meaning. The model does this while retaining the fully distributed representations characteristic of neural networks, providing the resistance to damage and graceful degradation that are generally lacking in symbolic and statistical approaches. Finally, the model learns via direct imitation of another speaker, allowing it to emulate human processing with greater fidelity, thus increasing the relatability of its behavior.
Along the way, this dissertation develops a novel training algorithm that, for the first time, requires only local computations to train arbitrary second-order recurrent neural networks. This algorithm is evaluated on its overall efficacy, biological feasibility, and ability to reproduce peculiarities of human learning such as age-correlated effects in second language acquisition
Word sense discovery and disambiguation
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses).
If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses.
If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets.
In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications.
This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes.
Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery
Recommended from our members
On-device mobile speech recognition
Despite many years of research, Speech Recognition remains an active area of research in Artificial Intelligence. Currently, the most common commercial application of this technology on mobile devices uses a wireless client – server approach to meet the computational and memory demands of the speech recognition process. Unfortunately, such an approach is unlikely to remain viable when fully applied over the approximately 7.22 Billion mobile phones currently in circulation. In this thesis we present an On – Device Speech recognition system. Such a system has the potential to completely eliminate the wireless client-server bottleneck. For the Voice Activity Detection part of this work, this thesis presents two novel algorithms used to detect speech activity within an audio signal. The first algorithm is based on the Log Linear Predictive Cepstral Coefficients Residual signal. These LLPCCRS feature vectors were then classified into voice signal and non-voice signal segments using a modified K-means clustering algorithm. This VAD algorithm is shown to provide a better performance as compared to a conventional energy frame analysis based approach. The second algorithm developed is based on the Linear Predictive Cepstral Coefficients. This algorithm uses the frames within the speech signal with the minimum and maximum standard deviation, as candidates for a linear cross correlation against the rest of the frames within the audio signal. The cross correlated frames are then classified using the same modified K-means clustering algorithm. The resulting output provides a cluster for Speech frames and another cluster for Non–speech frames. This novel application of the linear cross correlation technique to linear predictive cepstral coefficients feature vectors provides a fast computation method for use on the mobile platform; as shown by the results presented in this thesis. The Speech recognition part of this thesis presents two novel Neural Network approaches to mobile Speech recognition. Firstly, a recurrent neural networks architecture is developed to accommodate the output of the VAD stage. Specifically, an Echo State Network (ESN) is used for phoneme level recognition. The drawbacks and advantages of this method are explained further within the thesis. Secondly, a dynamic Multi-Layer Perceptron approach is developed. This builds on the drawbacks of the ESN and provides a dynamic way of handling speech signal length variabilities within its architecture. This novel Dynamic Multi-Layer Perceptron uses both the Linear Predictive Cepstral Coefficients (LPC) and the Mel Frequency Cepstral Coefficients (MFCC) as input features. A speaker dependent approach is presented using the Centre for spoken Language and Understanding (CSLU) database. The results show a very distinct behaviour from conventional speech recognition approaches because the LPC shows performance figures very close to the MFCC. A speaker independent system, using the standard TIMIT dataset, is then implemented on the dynamic MLP for further confirmation of this. In this mode of operation the MFCC outperforms the LPC. Finally, all the results, with emphasis on the computation time of both these novel neural network approaches are compared directly to a conventional hidden Markov model on the CSLU and TIMIT standard datasets
Modeling the effects of entrenchment and memory development on second language acquisition
The observation that language learning outcomes are less consistent the older one becomes has motivated a large portion of second language acquisition research (e.g., Hartshorne, Tenenbaum, & Pinker, 2018; DeKeyser, 2012). Hypotheses about the underlying mechanisms which lead to age-related declines are traditionally tested with human subjects; however, many hypotheses cannot be fully evaluated in the natural world due to maturational and environmental constraints. In these scenarios, computational simulations provide a convenient way to test these hypotheses.
In the present work, recurrent neural networks are used to study the effects of linguistic entrenchment and memory development on second language acquisition. Previous computational studies have found mixed results regarding these factors. Three computational experiments using a range of languages were conducted to understand better the role of entrenchment and memory development in learning several linguistic sub-tasks: grammatical gender assignment, grammatical gender agreement, and word boundary identification.
Linguistic entrenchment consistently had a negative, but marginal, influence on second language learning outcomes in the gender assignment experiment. In the gender agreement and word boundary experiments, entrenchment rarely affected learning outcomes. Starting with fewer memory resources consistently led to poorer outcomes across learning tasks and languages. The complexity of the learning task and the regularity of the formal cues present in the linguistic input affected outcomes. In the gender assignment experiment, the first language influenced second language outcomes, especially when the second language had fewer gender classes than the first language. These results suggest that the effects of entrenchment and memory development on second language learning may be dependent upon the language pairs and the difficulty of the modeling task
Knowledge Modelling and Learning through Cognitive Networks
One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot
Neural plasticity and the limits of scientific knowledge
Western science claims to provide unique, objective information about the world. This
is supported by the observation that peoples across cultures will agree upon a common
description of the physical world. Further, the use of scientific instruments and
mathematics is claimed to enable the objectification of science.
In this work, carried out by reviewing the scientific literature, the above claims are
disputed systematically by evaluating the definition of physical reality and the scientific
method, showing that empiricism relies ultimately upon the human senses for the
evaluation of scientific theories and that measuring instruments cannot replace the
human sensory system.
Nativist and constructivist theories of human sensory development are reviewed, and it
is shown that nativist claims of core conceptual knowledge cannot be supported by the
findings in the literature, which shows that perception does not simply arise from a
process of maturation. Instead, sensory function requires a long process of learning
through interactions with the environment.
To more rigorously define physical reality and systematically evaluate the stability of
perception, and thus the basis of empiricism, the development of the method of
dimension analysis is reviewed. It is shown that this methodology, relied upon for the
mathematical analysis of physical quantities, is itself based upon empiricism, and that
all of physical reality can be described in terms of the three fundamental dimensions of
mass, length and time.
Hereafter the sensory modalities that inform us about these three dimensions are
systematically evaluated. The following careful analysis of neuronal plasticity in these
modalities shows that all the relevant senses acquire from the environment the capacity
to apprehend physical reality. It is concluded that physical reality is acquired rather than
given innately, and leads to the position that science cannot provide unique results.
Rather, those it can provide are sufficient for a particular environmental setting
Prototype modeling of vowel perception and production in a quantity language
Vowel prototypes refer to the psychological memory representations of the best exemplars of a vowel category. This thesis examines the role of prototypes in the perception and production of Finnish short and long vowels. A comparison with German as a linguistically different language with a similar vowel system is also made. The thesis reports on a series of four experiments in which prototypes are examined by means of behavioral psychoacoustic measurements and compared with vowel productions in quiet and in noise. In the perception experiments, Finnish and German listeners were asked to identify and evaluate the goodness of synthesized vowels representing either the entire vowel space or selected subareas of the space. In the production experiments, only Finnish speakers were recruited, but earlier reported production data were used for the comparison of Finnish and German. The new concept of the weighted prototype (Pω) is introduced in Study I, and its usability in contrast to absolute prototypes (Pa) and category centroids (Pc) is examined in Study IV.
Generally, the results support the finding that vowel categories are not homogenous in quality, but have an internal structure, and that there are significant quality differences between category members in terms of goodness ratings. The results of Studies I, II and III support the identity group interpretation of the Finnish quantity opposition by showing that the differences in the perceived quality and in the produced short and long vowels are not demonstrably dependent on the physical duration of the stimuli, although the production experiments in Studies I and III indicated that the short peripheral vowels, especially /u/ in Study III, are more centralized in the vowel space than the long vowels. On the basis of the results of Study II, the spectral and durational local effective vowel indicators of the initial auditory theory of vowel perception appear to be independent of each other, thus suggesting that the auditory vowel space (AVS) is orthogonal in terms of the measures used in the experiment. Furthermore, the reaction time results of Study II indicate that stimulus typicality in terms of vowel quantity affects the categorization process of quality but not its end result. The noise masking of production in Study III indicated that both of the noise types applied in the experiment, pink noise and babble noise, resulted in a prolongation of all vowel durations as reported earlier on the Lombard effect. However, the noise masking did not affect the Euclidean distances between the short and long vowels, but caused a minor systematic drift on F1–F2 space in both vowel types. The minor differences suggest that prototypes act as articulatory targets in a fire-and-forget manner without the auditory feedback affecting the immediate articulation.
The results concerning the different prototype measures indicated that the Pa and Pω differ significantly from the Pc, with the Pa being most peripheral. This gives some support to the adaptive dispersion effect in perception. The individual variations of the measures were normally distributed, with some exceptions for Pa in Finnish, and were, in terms of the coefficient of variation (CV), of the order of difference limen (DL) of frequency. These results suggest that, for normally distributed prototypes, and especially for Pω, which showed the least variation, two thirds of the subjects detected the best category representatives from a subset of stimuli that lie within the limits of DL of frequency from each other in the F1–F2 space. This finding can be regarded as a strong evidence for prototype theories, in other words, the best category representatives play a role by acting as templates in vowel perception. The listeners were able to recognize quality differences between and within vowel categories, but the majority of them ranked the best category exemplars from a subset of stimuli that were hardly distinguishable from each other.
There were some minor differences in the vowel systems of Finnish and German as indicated by the different prototype measures: the absolute prototypes showed the largest differences between the languages in /e/, / ø/ and /u/. This is in line with the earlier investigations on produced vowels in Finnish and German. Generally, the vowel systems of these two linguistically unrelated languages were strikingly similar, especially in the light of the Pω measure.
As presented in this thesis, the prototype approach provides a feasible tool for research and the results lend support to the idea that speech comprehension on the auditory, phonetic, and even on phonological processing levels is based on the memory representations of typical speech sounds of one’s native tongue, formed during the early language acquisition phase, and these representations may be similar for the speakers and listeners of two different languages with comparable vowel systems
Data mining and modelling for sign language
Sign languages have received significantly less attention than spoken languages in the
research areas of corpus analysis, machine translation, recognition, synthesis and social signal processing, amongst others. This is mainly due to signers being in a clear minority and
there being a strong prior belief that sign languages are simply arbitrary gestures. To date,
this manifests in the insufficiency of sign language resources available for computational
modelling and analysis, with no agreed standards and relatively stagnated advancements
compared to spoken language interaction research. Fortunately, the machine learning community has developed methods, such as transfer learning, for dealing with sparse resources,
while data mining techniques, such as clustering can provide insights into the data. The
work described here utilises such transfer learning techniques to apply neural language
model to signed utterances and to compare sign language phonemes, which allows for
clustering of similar signs, leading to automated annotation of sign language resources.
This thesis promotes the idea that sign language research in computing should rely less on
hand-annotated data thus opening up the prospect of using readily available online data
(e.g. signed song videos) through the computational modelling and automated annotation
techniques presented in this thesis