71,085 research outputs found
Why Do Masked Neural Language Models Still Need Semantic Knowledge in Question Answering?
Department of Computer Science and EngineeringPre-trained language models have widely been used to solve various natural language processing tasks. Especially, masked neural language models, which are composed of huge neural networks that are trained to restore the masked tokens, have shown outstanding performance in many tasks including text classification and question answering. However, it is challenging to identify what knowledge are trained inside due to the ???black box??? nature of deep neural networks with numerous and intermingled parameters. There have been recent studies that try to approximate how much knowledge is learned in masked neural language models. However, a recent study reveals that the models do not precisely understand semantic knowledge while they show superhuman performance. In this work, we empirically verify that questions that require semantic knowledge are still challenging for masked neural language models to solve in question answering. Therefore, we suggest a possible solution that injects semantic knowledge from external repositories into masked neural language models.ope
Jointly Modeling Embedding and Translation to Bridge Video and Language
Automatically describing video content with natural language is a fundamental
challenge of multimedia. Recurrent Neural Networks (RNN), which models sequence
dynamics, has attracted increasing attention on visual interpretation. However,
most existing approaches generate a word locally with given previous words and
the visual content, while the relationship between sentence semantics and
visual content is not holistically exploited. As a result, the generated
sentences may be contextually correct but the semantics (e.g., subjects, verbs
or objects) are not true.
This paper presents a novel unified framework, named Long Short-Term Memory
with visual-semantic Embedding (LSTM-E), which can simultaneously explore the
learning of LSTM and visual-semantic embedding. The former aims to locally
maximize the probability of generating the next word given previous words and
visual content, while the latter is to create a visual-semantic embedding space
for enforcing the relationship between the semantics of the entire sentence and
visual content. Our proposed LSTM-E consists of three components: a 2-D and/or
3-D deep convolutional neural networks for learning powerful video
representation, a deep RNN for generating sentences, and a joint embedding
model for exploring the relationships between visual content and sentence
semantics. The experiments on YouTube2Text dataset show that our proposed
LSTM-E achieves to-date the best reported performance in generating natural
sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. We also
demonstrate that LSTM-E is superior in predicting Subject-Verb-Object (SVO)
triplets to several state-of-the-art techniques
Deep Neural Models for Medical Concept Normalization in User-Generated Texts
In this work, we consider the medical concept normalization problem, i.e.,
the problem of mapping a health-related entity mention in a free-form text to a
concept in a controlled vocabulary, usually to the standard thesaurus in the
Unified Medical Language System (UMLS). This is a challenging task since
medical terminology is very different when coming from health care
professionals or from the general public in the form of social media texts. We
approach it as a sequence learning problem with powerful neural networks such
as recurrent neural networks and contextualized word representation models
trained to obtain semantic representations of social media expressions. Our
experimental evaluation over three different benchmarks shows that neural
architectures leverage the semantic meaning of the entity mention and
significantly outperform an existing state of the art models.Comment: This is preprint of the paper "Deep Neural Models for Medical Concept
Normalization in User-Generated Texts" to be published at ACL 2019 - 57th
Annual Meeting of the Association for Computational Linguistics, Proceedings
of the Student Research Worksho
Towards Robust Named Entity Recognition for Historic German
Recent advances in language modeling using deep neural networks have shown
that these models learn representations, that vary with the network depth from
morphology to semantic relationships like co-reference. We apply pre-trained
language models to low-resource named entity recognition for Historic German.
We show on a series of experiments that character-based pre-trained language
models do not run into trouble when faced with low-resource datasets. Our
pre-trained character-based language models improve upon classical CRF-based
methods and previous work on Bi-LSTMs by boosting F1 score performance by up to
6%. Our pre-trained language and NER models are publicly available under
https://github.com/stefan-it/historic-ner .Comment: 8 pages, 5 figures, accepted at the 4th Workshop on Representation
Learning for NLP (RepL4NLP), held in conjunction with ACL 201
Effective Spoken Language Labeling with Deep Recurrent Neural Networks
Understanding spoken language is a highly complex problem, which can be
decomposed into several simpler tasks. In this paper, we focus on Spoken
Language Understanding (SLU), the module of spoken dialog systems responsible
for extracting a semantic interpretation from the user utterance. The task is
treated as a labeling problem. In the past, SLU has been performed with a wide
variety of probabilistic models. The rise of neural networks, in the last
couple of years, has opened new interesting research directions in this domain.
Recurrent Neural Networks (RNNs) in particular are able not only to represent
several pieces of information as embeddings but also, thanks to their recurrent
architecture, to encode as embeddings relatively long contexts. Such long
contexts are in general out of reach for models previously used for SLU. In
this paper we propose novel RNNs architectures for SLU which outperform
previous ones. Starting from a published idea as base block, we design new deep
RNNs achieving state-of-the-art results on two widely used corpora for SLU:
ATIS (Air Traveling Information System), in English, and MEDIA (Hotel
information and reservation in France), in French.Comment: 8 pages. Rejected from IJCAI 2017, good remarks overall, but slightly
off-topic as from global meta-reviews. Recommendations: 8, 6, 6, 4. arXiv
admin note: text overlap with arXiv:1706.0174
Semantic Representation and Inference for NLP
Semantic representation and inference is essential for Natural Language
Processing (NLP). The state of the art for semantic representation and
inference is deep learning, and particularly Recurrent Neural Networks (RNNs),
Convolutional Neural Networks (CNNs), and transformer Self-Attention models.
This thesis investigates the use of deep learning for novel semantic
representation and inference, and makes contributions in the following three
areas: creating training data, improving semantic representations and extending
inference learning. In terms of creating training data, we contribute the
largest publicly available dataset of real-life factual claims for the purpose
of automatic claim verification (MultiFC), and we present a novel inference
model composed of multi-scale CNNs with different kernel sizes that learn from
external sources to infer fact checking labels. In terms of improving semantic
representations, we contribute a novel model that captures non-compositional
semantic indicators. By definition, the meaning of a non-compositional phrase
cannot be inferred from the individual meanings of its composing words (e.g.,
hot dog). Motivated by this, we operationalize the compositionality of a phrase
contextually by enriching the phrase representation with external word
embeddings and knowledge graphs. Finally, in terms of inference learning, we
propose a series of novel deep learning architectures that improve inference by
using syntactic dependencies, by ensembling role guided attention heads,
incorporating gating layers, and concatenating multiple heads in novel and
effective ways. This thesis consists of seven publications (five published and
two under review).Comment: PhD thesis, the University of Copenhage
A Deep Relevance Matching Model for Ad-hoc Retrieval
In recent years, deep neural networks have led to exciting breakthroughs in
speech recognition, computer vision, and natural language processing (NLP)
tasks. However, there have been few positive results of deep models on ad-hoc
retrieval tasks. This is partially due to the fact that many important
characteristics of the ad-hoc retrieval task have not been well addressed in
deep models yet. Typically, the ad-hoc retrieval task is formalized as a
matching problem between two pieces of text in existing work using deep models,
and treated equivalent to many NLP tasks such as paraphrase identification,
question answering and automatic conversation. However, we argue that the
ad-hoc retrieval task is mainly about relevance matching while most NLP
matching tasks concern semantic matching, and there are some fundamental
differences between these two matching tasks. Successful relevance matching
requires proper handling of the exact matching signals, query term importance,
and diverse matching requirements. In this paper, we propose a novel deep
relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model
employs a joint deep architecture at the query term level for relevance
matching. By using matching histogram mapping, a feed forward matching network,
and a term gating network, we can effectively deal with the three relevance
matching factors mentioned above. Experimental results on two representative
benchmark collections show that our model can significantly outperform some
well-known retrieval models as well as state-of-the-art deep matching models.Comment: CIKM 2016, long pape
Spoken Language Processing and Modeling for Aviation Communications
With recent advances in machine learning and deep learning technologies and the creation of larger aviation-specific corpora, applying natural language processing technologies, especially those based on transformer neural networks, to aviation communications is becoming increasingly feasible. Previous work has focused on machine learning applications to natural language processing, such as N-grams and word lattices. This thesis experiments with a process for pretraining transformer-based language models on aviation English corpora and compare the effectiveness and performance of language models transfer learned from pretrained checkpoints and those trained from their base weight initializations (trained from scratch). The results suggest that transformer language models trained from scratch outperform models fine-tuned from pretrained checkpoints. The work concludes by recommending future work to improve pretraining performance and suggestions for downstream, in-domain tasks such as semantic extraction, named entity recognition (callsign identification), speaker role identification, and speech recognition
- …