3,293 research outputs found
Augmenting Neural Machine Translation with Knowledge Graphs
While neural networks have been used extensively to make substantial progress
in the machine translation task, they are known for being heavily dependent on
the availability of large amounts of training data. Recent efforts have tried
to alleviate the data sparsity problem by augmenting the training data using
different strategies, such as back-translation. Along with the data scarcity,
the out-of-vocabulary words, mostly entities and terminological expressions,
pose a difficult challenge to Neural Machine Translation systems. In this
paper, we hypothesize that knowledge graphs enhance the semantic feature
extraction of neural models, thus optimizing the translation of entities and
terminological expressions in texts and consequently leading to a better
translation quality. We hence investigate two different strategies for
incorporating knowledge graphs into neural models without modifying the neural
network architectures. We also examine the effectiveness of our augmentation
method to recurrent and non-recurrent (self-attentional) neural architectures.
Our knowledge graph augmented neural translation model, dubbed KG-NMT, achieves
significant and consistent improvements of +3 BLEU, METEOR and chrF3 on average
on the newstest datasets between 2014 and 2018 for WMT English-German
translation task
POLYGLOT-NER: Massive Multilingual Named Entity Recognition
The increasing diversity of languages used on the web introduces a new level
of complexity to Information Retrieval (IR) systems. We can no longer assume
that textual content is written in one language or even the same language
family. In this paper, we demonstrate how to build massive multilingual
annotators with minimal human expertise and intervention. We describe a system
that builds Named Entity Recognition (NER) annotators for 40 major languages
using Wikipedia and Freebase. Our approach does not require NER human annotated
datasets or language specific resources like treebanks, parallel corpora, and
orthographic rules. The novelty of approach lies therein - using only language
agnostic techniques, while achieving competitive performance.
Our method learns distributed word representations (word embeddings) which
encode semantic and syntactic features of words in each language. Then, we
automatically generate datasets from Wikipedia link structure and Freebase
attributes. Finally, we apply two preprocessing stages (oversampling and exact
surface form matching) which do not require any linguistic expertise.
Our evaluation is two fold: First, we demonstrate the system performance on
human annotated datasets. Second, for languages where no gold-standard
benchmarks are available, we propose a new method, distant evaluation, based on
statistical machine translation.Comment: 9 pages, 4 figures, 5 table
Exploring the importance of context and embeddings in neural NER models for task-oriented dialogue systems
Named Entity Recognition (NER), a classic sequence labelling task, is an
essential component of natural language understanding (NLU) systems in
task-oriented dialog systems for slot filling. For well over a decade,
different methods from lookup using gazetteers and domain ontology, classifiers
over handcrafted features to end-to-end systems involving neural network
architectures have been evaluated mostly in language-independent
non-conversational settings. In this paper, we evaluate a modified version of
the recent state of the art neural architecture in a conversational setting
where messages are often short and noisy. We perform an array of experiments
with different combinations of including the previous utterance in the dialogue
as a source of additional features and using word and character level
embeddings trained on a larger external corpus. All methods are evaluated on a
combined dataset formed from two public English task-oriented conversational
datasets belonging to travel and restaurant domains respectively. For
additional evaluation, we also repeat some of our experiments after adding
automatically translated and transliterated (from translated) versions to the
English only dataset.Comment: 6 Pages Accepted at International Conference on Natural Language
Processing (2018) - (ACL
The ARIEL-CMU Systems for LoReHLT18
This paper describes the ARIEL-CMU submissions to the Low Resource Human
Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine
Translation (MT), Entity Discovery and Linking (EDL), and detection of
Situation Frames in Text and Speech (SF Text and Speech)
Code-Switching for Enhancing NMT with Pre-Specified Translation
Leveraging user-provided translation to constrain NMT has practical
significance. Existing methods can be classified into two main categories,
namely the use of placeholder tags for lexicon words and the use of hard
constraints during decoding. Both methods can hurt translation fidelity for
various reasons. We investigate a data augmentation method, making
code-switched training data by replacing source phrases with their target
translations. Our method does not change the MNT model or decoding algorithm,
allowing the model to learn lexicon translations by copying source-side target
words. Extensive experiments show that our method achieves consistent
improvements over existing approaches, improving translation of constrained
words without hurting unconstrained words
Attentive Neural Network for Named Entity Recognition in Vietnamese
We propose an attentive neural network for the task of named entity
recognition in Vietnamese. The proposed attentive neural model makes use of
character-based language models and word embeddings to encode words as vector
representations. A neural network architecture of encoder, attention, and
decoder layers is then utilized to encode knowledge of input sentences and to
label entity tags. The experimental results show that the proposed attentive
neural network achieves the state-of-the-art results on the benchmark named
entity recognition datasets in Vietnamese in comparison to both hand-crafted
features based models and neural models
Unsupervised Neural Machine Translation
In spite of the recent success of neural machine translation (NMT) in
standard benchmarks, the lack of large parallel corpora poses a major practical
problem for many language pairs. There have been several proposals to alleviate
this issue with, for instance, triangulation and semi-supervised learning
techniques, but they still require a strong cross-lingual signal. In this work,
we completely remove the need of parallel data and propose a novel method to
train an NMT system in a completely unsupervised manner, relying on nothing but
monolingual corpora. Our model builds upon the recent work on unsupervised
embedding mappings, and consists of a slightly modified attentional
encoder-decoder model that can be trained on monolingual corpora alone using a
combination of denoising and backtranslation. Despite the simplicity of the
approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014
French-to-English and German-to-English translation. The model can also profit
from small parallel corpora, and attains 21.81 and 15.24 points when combined
with 100,000 parallel sentences, respectively. Our implementation is released
as an open source project.Comment: Published as a conference paper at ICLR 201
A Neural Language Model for Dynamically Representing the Meanings of Unknown Words and Entities in a Discourse
This study addresses the problem of identifying the meaning of unknown words
or entities in a discourse with respect to the word embedding approaches used
in neural language models. We proposed a method for on-the-fly construction and
exploitation of word embeddings in both the input and output layers of a neural
model by tracking contexts. This extends the dynamic entity representation used
in Kobayashi et al. (2016) and incorporates a copy mechanism proposed
independently by Gu et al. (2016) and Gulcehre et al. (2016). In addition, we
construct a new task and dataset called Anonymized Language Modeling for
evaluating the ability to capture word meanings while reading. Experiments
conducted using our novel dataset show that the proposed variant of RNN
language model outperformed the baseline model. Furthermore, the experiments
also demonstrate that dynamic updates of an output layer help a model predict
reappearing entities, whereas those of an input layer are effective to predict
words following reappearing entities.Comment: 11 pages. To appear in IJCNLP 201
Machine Translation using Semantic Web Technologies: A Survey
A large number of machine translation approaches have recently been developed
to facilitate the fluid migration of content across languages. However, the
literature suggests that many obstacles must still be dealt with to achieve
better automatic translations. One of these obstacles is lexical and syntactic
ambiguity. A promising way of overcoming this problem is using Semantic Web
technologies. This article presents the results of a systematic review of
machine translation approaches that rely on Semantic Web technologies for
translating texts. Overall, our survey suggests that while Semantic Web
technologies can enhance the quality of machine translation outputs for various
problems, the combination of both is still in its infancy.Comment: 23 pages, 2 figures, 4 table
Detecting Cybersecurity Events from Noisy Short Text
It is very critical to analyze messages shared over social networks for cyber
threat intelligence and cyber-crime prevention. In this study, we propose a
method that leverages both domain-specific word embeddings and task-specific
features to detect cyber security events from tweets. Our model employs a
convolutional neural network (CNN) and a long short-term memory (LSTM)
recurrent neural network which takes word level meta-embeddings as inputs and
incorporates contextual embeddings to classify noisy short text. We collected a
new dataset of cyber security related tweets from Twitter and manually
annotated a subset of 2K of them. We experimented with this dataset and
concluded that the proposed model outperforms both traditional and neural
baselines. The results suggest that our method works well for detecting cyber
security events from noisy short text.Comment: Accepted February 2019 to North American Chapter of the Association
for Computational Linguistics (NAACL) 201
- …