27,879 research outputs found
An Empirical Study on Language Model Adaptation Using a Metric of Domain Similarity
Abstract. This paper presents an empirical study on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training models, in the application of Japanese Kana-Kanji conversion. We compare the performance of these methods from various angles by adapting the baseline model to four adaptation domains. In particular, we attempt to interpret the results given in terms of the character error rate (CER) by correlating them with the characteristics of the adaptation domain measured using the information-theoretic notion of cross entropy. We show that such a metric correlates well with the CER performance of the adaptation methods, and also show that the discriminative methods are not only superior to a MAP-based method in terms of achieving larger CER reduction, but are also more robust against the similarity of background and adaptation domains.
Domain Adaptation for Statistical Classifiers
The most basic assumption used in statistical learning theory is that
training data and test data are drawn from the same underlying distribution.
Unfortunately, in many applications, the "in-domain" test data is drawn from a
distribution that is related, but not identical, to the "out-of-domain"
distribution of the training data. We consider the common case in which labeled
out-of-domain data is plentiful, but labeled in-domain data is scarce. We
introduce a statistical formulation of this problem in terms of a simple
mixture model and present an instantiation of this framework to maximum entropy
classifiers and their linear chain counterparts. We present efficient inference
algorithms for this special case based on the technique of conditional
expectation maximization. Our experimental results show that our approach leads
to improved performance on three real world tasks on four different data sets
from the natural language processing domain
Topic-based mixture language modelling
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling.
A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost
Unsupervised Domain Adaptation on Reading Comprehension
Reading comprehension (RC) has been studied in a variety of datasets with the
boosted performance brought by deep neural networks. However, the
generalization capability of these models across different domains remains
unclear. To alleviate this issue, we are going to investigate unsupervised
domain adaptation on RC, wherein a model is trained on labeled source domain
and to be applied to the target domain with only unlabeled samples. We first
show that even with the powerful BERT contextual representation, the
performance is still unsatisfactory when the model trained on one dataset is
directly applied to another target dataset. To solve this, we provide a novel
conditional adversarial self-training method (CASe). Specifically, our approach
leverages a BERT model fine-tuned on the source dataset along with the
confidence filtering to generate reliable pseudo-labeled samples in the target
domain for self-training. On the other hand, it further reduces domain
distribution discrepancy through conditional adversarial learning across
domains. Extensive experiments show our approach achieves comparable accuracy
to supervised models on multiple large-scale benchmark datasets.Comment: 8 pages, 6 figures, 5 tables, Accepted by AAAI 202
- …