622 research outputs found
Inferring Concept Prerequisite Relations from Online Educational Resources
The Internet has rich and rapidly increasing sources of high quality
educational content. Inferring prerequisite relations between educational
concepts is required for modern large-scale online educational technology
applications such as personalized recommendations and automatic curriculum
creation. We present PREREQ, a new supervised learning method for inferring
concept prerequisite relations. PREREQ is designed using latent representations
of concepts obtained from the Pairwise Latent Dirichlet Allocation model, and a
neural network based on the Siamese network architecture. PREREQ can learn
unknown concept prerequisites from course prerequisites and labeled concept
prerequisite data. It outperforms state-of-the-art approaches on benchmark
datasets and can effectively learn from very less training data. PREREQ can
also use unlabeled video playlists, a steadily growing source of training data,
to learn concept prerequisites, thus obviating the need for manual annotation
of course prerequisites.Comment: Accepted at the AAAI Conference on Innovative Applications of
Artificial Intelligence (IAAI-19
Exploiting word embeddings for modeling bilexical relations
There has been an exponential surge of text data in the recent years. As a consequence, unsupervised methods that make use of this data have been steadily growing in the field of natural language processing (NLP). Word embeddings are low-dimensional vectors obtained using unsupervised techniques on the large unlabelled corpora, where words from the vocabulary are mapped to vectors of real numbers. Word embeddings aim to capture syntactic and semantic properties of words.
In NLP, many tasks involve computing the compatibility between lexical items under some linguistic relation. We call this type of relation a bilexical relation. Our thesis defines statistical models for bilexical relations
that centrally make use of word embeddings. Our principle aim is that the word embeddings will favor generalization to words not seen during the training of the model.
The thesis is structured in four parts. In the first part of this thesis, we present a bilinear model over word embeddings that leverages a small supervised dataset for a binary linguistic relation. Our learning algorithm exploits low-rank bilinear forms and induces a low-dimensional embedding tailored for a target linguistic relation. This results in compressed task-specific embeddings.
In the second part of our thesis, we extend our bilinear model to a ternary
setting and propose a framework for resolving prepositional phrase attachment ambiguity using word embeddings. Our models perform competitively with state-of-the-art models. In addition, our method obtains significant improvements on out-of-domain tests by simply using word-embeddings induced from source and target domains.
In the third part of this thesis, we further extend the bilinear models for expanding vocabulary in the context of statistical phrase-based machine translation. Our model obtains a probabilistic list of possible translations of target language words, given a word in the source language. We do this by projecting pre-trained embeddings into a common subspace using a log-bilinear model. We empirically notice a significant improvement on an out-of-domain test set.
In the final part of our thesis, we propose a non-linear model that maps initial word embeddings to task-tuned word embeddings, in the context of a neural network dependency parser. We demonstrate its use for improved dependency parsing, especially for sentences with unseen words. We also show downstream improvements on a sentiment analysis task.En els darrers anys hi ha hagut un sorgiment notable de dades en format textual. Conseqüentment, en el camp del Processament del Llenguatge Natural (NLP, de l'anglès "Natural Language Processing") s'han desenvolupat mètodes no supervistats que fan ús d'aquestes dades. Els anomenats "word embeddings", o embeddings de paraules, són vectors de dimensionalitat baixa que s'obtenen mitjançant tècniques no supervisades aplicades a corpus textuals de grans volums. Com a resultat, cada paraula del diccionari es correspon amb un vector de nombres reals, el propòsit del qual és capturar propietats sintàctiques i semàntiques de la paraula corresponent. Moltes tasques de NLP involucren calcular la compatibilitat entre elements lèxics en l'àmbit d'una relació lingüística. D'aquest tipus de relació en diem relació bilèxica. Aquesta tesi proposa models estadístics per a relacions bilèxiques que fan ús central d'embeddings de paraules, amb l'objectiu de millorar la generalització del model lingüístic a paraules no vistes durant l'entrenament. La tesi s'estructura en quatre parts. A la primera part presentem un model bilineal sobre embeddings de paraules que explota un conjunt petit de dades anotades sobre una relaxió bilèxica. L'algorisme d'aprenentatge treballa amb formes bilineals de poc rang, i indueix embeddings de poca dimensionalitat que estan especialitzats per la relació bilèxica per la qual s'han entrenat. Com a resultat, obtenim embeddings de paraules que corresponen a compressions d'embeddings per a una relació determinada. A la segona part de la tesi proposem una extensió del model bilineal a trilineal, i amb això proposem un nou model per a resoldre ambigüitats de sintagmes preposicionals que usa només embeddings de paraules. En una sèrie d'avaluacións, els nostres models funcionen de manera similar a l'estat de l'art. A més, el nostre mètode obté millores significatives en avaluacions en textos de dominis diferents al d'entrenament, simplement usant embeddings induïts amb textos dels dominis d'entrenament i d'avaluació. A la tercera part d'aquesta tesi proposem una altra extensió dels models bilineals per ampliar la cobertura lèxica en el context de models estadístics de traducció automàtica. El nostre model probabilístic obté, donada una paraula en la llengua d'origen, una llista de possibles traduccions en la llengua de destí. Fem això mitjançant una projecció d'embeddings pre-entrenats a un sub-espai comú, usant un model log-bilineal. Empíricament, observem una millora significativa en avaluacions en dominis diferents al d'entrenament. Finalment, a la quarta part de la tesi proposem un model no lineal que indueix una correspondència entre embeddings inicials i embeddings especialitzats, en el context de tasques d'anàlisi sintàctica de dependències amb models neuronals. Mostrem que aquest mètode millora l'analisi de dependències, especialment en oracions amb paraules no vistes durant l'entrenament. També mostrem millores en un tasca d'anàlisi de sentiment
On Model Stability as a Function of Random Seed
In this paper, we focus on quantifying model stability as a function of
random seed by investigating the effects of the induced randomness on model
performance and the robustness of the model in general. We specifically perform
a controlled study on the effect of random seeds on the behaviour of attention,
gradient-based and surrogate model based (LIME) interpretations. Our analysis
suggests that random seeds can adversely affect the consistency of models
resulting in counterfactual interpretations. We propose a technique called
Aggressive Stochastic Weight Averaging (ASWA)and an extension called
Norm-filtered Aggressive Stochastic Weight Averaging (NASWA) which improves the
stability of models over random seeds. With our ASWA and NASWA based
optimization, we are able to improve the robustness of the original model, on
average reducing the standard deviation of the model's performance by 72%.Comment: v1; Accepted for publication at CoNLL 201
Recommended from our members
Deciding when, how and for whom to simplify
Current Automatic Text Simplification (TS) work relies on sequence-to-sequence neural models that learn simplification operations from parallel complex-simple corpora. In this paper we address three open challenges in these approaches: (i) avoiding unnecessary transformations, (ii) determining which operations to perform, and (iii) generating simplifications that are suitable for a given target audience. For (i), we propose joint and two-stage approaches where instances are marked or classified as simple or complex. For (ii) and (iii), we propose fusion-based approaches to incorporate information on the target grade level as well as the types of operation to perform in the models. While grade-level information is provided as metadata, we devise predictors for the type of operation. We study different representations for this information as well as different ways in which it is used in the models. Our approach outperforms previous work on neural TS, with our best model following the two-stage approach and using the information about grade level and type of operation to initialise the encoder and the decoder, respectively
Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation
Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained. In order to alleviate the problem, we propose to use a log-bilinear softmax-based model for vocabulary expansion, such that given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language. Our model uses only word embeddings trained on significantly large unlabelled monolingual corpora and trains over a fairly small, word-to-word bilingual dictionary. We input this probabilistic list into a standard phrase-based statistical machine translation system and obtain consistent improvements in translation quality on the English-Spanish language pair. Especially, we get an improvement of 3.9 BLEU points when tested over an out-of-domain test set
Deep copycat networks for text-to-text generation.
Most text-to-text generation tasks, for example text summarisation and text simplification, require copying words from the input to the output. We introduce Copycat, a transformer-based pointer network for such tasks which obtains competitive results in abstractive text summarisation and generates more abstractive summaries. We propose a further extension of this architecture for automatic post-editing, where generation is conditioned over two inputs (source language and machine translation), and the model is capable of deciding where to copy information from. This approach achieves competitive performance when compared to state-of-the-art automated post-editing systems. More importantly, we show that it addresses a well-known limitation of automatic post-editing - overcorrecting translations - and that our novel mechanism for copying source language words improves the results
Are words equally surprising in audio and audio-visual comprehension?
We report a controlled study investigating the effect of visual information
(i.e., seeing the speaker) on spoken language comprehension. We compare the ERP
signature (N400) associated with each word in audio-only and audio-visual
presentations of the same verbal stimuli. We assess the extent to which
surprisal measures (which quantify the predictability of words in their lexical
context) are generated on the basis of different types of language models
(specifically n-gram and Transformer models) that predict N400 responses for
each word. Our results indicate that cognitive effort differs significantly
between multimodal and unimodal settings. In addition, our findings suggest
that while Transformer-based models, which have access to a larger lexical
context, provide a better fit in the audio-only setting, 2-gram language models
are more effective in the multimodal setting. This highlights the significant
impact of local lexical context on cognitive processing in a multimodal
environment.Comment: In CogSci 202
- …