74 research outputs found
Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
Traditional Neural machine translation (NMT) involves a fixed training
procedure where each sentence is sampled once during each epoch. In reality,
some sentences are well-learned during the initial few epochs; however, using
this approach, the well-learned sentences would continue to be trained along
with those sentences that were not well learned for 10-30 epochs, which results
in a wastage of time. Here, we propose an efficient method to dynamically
sample the sentences in order to accelerate the NMT training. In this approach,
a weight is assigned to each sentence based on the measured difference between
the training costs of two iterations. Further, in each epoch, a certain
percentage of sentences are dynamically sampled according to their weights.
Empirical results based on the NIST Chinese-to-English and the WMT
English-to-German tasks depict that the proposed method can significantly
accelerate the NMT training and improve the NMT performance.Comment: Revised version of ACL-201
Towards Robust Named Entity Recognition for Historic German
Recent advances in language modeling using deep neural networks have shown
that these models learn representations, that vary with the network depth from
morphology to semantic relationships like co-reference. We apply pre-trained
language models to low-resource named entity recognition for Historic German.
We show on a series of experiments that character-based pre-trained language
models do not run into trouble when faced with low-resource datasets. Our
pre-trained character-based language models improve upon classical CRF-based
methods and previous work on Bi-LSTMs by boosting F1 score performance by up to
6%. Our pre-trained language and NER models are publicly available under
https://github.com/stefan-it/historic-ner .Comment: 8 pages, 5 figures, accepted at the 4th Workshop on Representation
Learning for NLP (RepL4NLP), held in conjunction with ACL 201
What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis
Building named entity recognition (NER) models for languages that do not have
much training data is a challenging task. While recent work has shown promising
results on cross-lingual transfer from high-resource languages to low-resource
languages, it is unclear what knowledge is transferred. In this paper, we first
propose a simple and efficient neural architecture for cross-lingual NER.
Experiments show that our model achieves competitive performance with the
state-of-the-art. We further analyze how transfer learning works for
cross-lingual NER on two transferable factors: sequential order and
multilingual embeddings, and investigate how model performance varies across
entity lengths. Finally, we conduct a case-study on a non-Latin language,
Bengali, which suggests that leveraging knowledge from Wikipedia will be a
promising direction to further improve the model performances. Our results can
shed light on future research for improving cross-lingual NER.Comment: 7 page
- …