14,123 research outputs found
Myanmar named entity corpus and its use in syllable-based neural named entity recognition
Myanmar language is a low-resource language and this is one of the main reasons why Myanmar Natural Language Processing lagged behind compared to other languages. Currently, there is no publicly available named entity corpus for Myanmar language. As part of this work, a very first manually annotated Named Entity tagged corpus for Myanmar language was developed and proposed to support the evaluation of named entity extraction. At present, our named entity corpus contains approximately 170,000 name entities and 60,000 sentences. This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition. Experimental results of the 10-fold cross validation revealed that syllable-based neural sequence models without additional feature engineering can give better results compared to baseline CRF model. This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language
Semi-supervised sequence tagging with bidirectional language models
Pre-trained word embeddings learned from unlabeled text have become a
standard component of neural network architectures for NLP tasks. However, in
most cases, the recurrent network that operates on word-level representations
to produce context sensitive representations is trained on relatively little
labeled data. In this paper, we demonstrate a general semi-supervised approach
for adding pre- trained context embeddings from bidirectional language models
to NLP systems and apply it to sequence labeling tasks. We evaluate our model
on two standard datasets for named entity recognition (NER) and chunking, and
in both cases achieve state of the art results, surpassing previous systems
that use other forms of transfer or joint learning with additional labeled data
and task specific gazetteers.Comment: To appear in ACL 201
Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish
Named Entity Recognition (NER) has greatly advanced by the introduction of
deep neural architectures. However, the success of these methods depends on
large amounts of training data. The scarcity of publicly-available
human-labeled datasets has resulted in limited evaluation of existing NER
systems, as is the case for Danish. This paper studies the effectiveness of
cross-lingual transfer for Danish, evaluates its complementarity to limited
gold data, and sheds light on performance of Danish NER.Comment: Published at NoDaLiDa 2019; updated (system, data and repository
details
- …