Search CORE

74 research outputs found

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Author: Sumita Eiichiro
Utiyama Masao
Wang Rui
Publication venue
Publication date: 01/01/2018
Field of study

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks depict that the proposed method can significantly accelerate the NMT training and improve the NMT performance.Comment: Revised version of ACL-201

arXiv.org e-Print Archive

Crossref

Towards Robust Named Entity Recognition for Historic German

Author: Baiter Johannes
Schweter Stefan
Publication venue
Publication date: 01/01/2019
Field of study

Recent advances in language modeling using deep neural networks have shown that these models learn representations, that vary with the network depth from morphology to semantic relationships like co-reference. We apply pre-trained language models to low-resource named entity recognition for Historic German. We show on a series of experiments that character-based pre-trained language models do not run into trouble when faced with low-resource datasets. Our pre-trained character-based language models improve upon classical CRF-based methods and previous work on Bi-LSTMs by boosting F1 score performance by up to 6%. Our pre-trained language and NER models are publicly available under https://github.com/stefan-it/historic-ner .Comment: 8 pages, 5 figures, accepted at the 4th Workshop on Representation Learning for NLP (RepL4NLP), held in conjunction with ACL 201

arXiv.org e-Print Archive

Crossref

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

Author: Huang Xiaolei
May Jonathan
Peng Nanyun
Publication venue
Publication date: 01/01/2019
Field of study

Building named entity recognition (NER) models for languages that do not have much training data is a challenging task. While recent work has shown promising results on cross-lingual transfer from high-resource languages to low-resource languages, it is unclear what knowledge is transferred. In this paper, we first propose a simple and efficient neural architecture for cross-lingual NER. Experiments show that our model achieves competitive performance with the state-of-the-art. We further analyze how transfer learning works for cross-lingual NER on two transferable factors: sequential order and multilingual embeddings, and investigate how model performance varies across entity lengths. Finally, we conduct a case-study on a non-Latin language, Bengali, which suggests that leveraging knowledge from Wikipedia will be a promising direction to further improve the model performances. Our results can shed light on future research for improving cross-lingual NER.Comment: 7 page

arXiv.org e-Print Archive

University of Memphis Digital Commons

Crossref