159 research outputs found
Neural Unsupervised Domain Adaptation in NLP—A Survey
Deep neural networks excel at learning from labeled data and achieve
state-of-the-art results on a wide array of Natural Language Processing tasks.
In contrast, learning from unlabeled data, especially under domain shift,
remains a challenge. Motivated by the latest advances, in this survey we review
neural unsupervised domain adaptation techniques which do not require labeled
target domain data. This is a more challenging yet a more widely applicable
setup. We outline methods, from early approaches in traditional non-neural
methods to pre-trained model transfer. We also revisit the notion of domain,
and we uncover a bias in the type of Natural Language Processing tasks which
received most attention. Lastly, we outline future directions, particularly the
broader need for out-of-distribution generalization of future intelligent NLP
LCCT: a semisupervised model for sentiment classification
Conference Theme: Human Language TechnologiesAnalyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domain-specific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with missing labels and learning from incomplete sentiment lexicons. This paper presents a LCCT (Lexicon-based and Corpus-based, Co-Training) model for semi-supervised sentiment classification. The proposed method combines the idea of lexicon-based learning and corpus-based learning in a unified co-training framework. It is capable of incorporating both domain-specific and domain-independent knowledge. Extensive experiments show that it achieves very competitive classification accuracy, even with a small portion of labeled data. Comparing to state-of-the-art sentiment classification methods, the LCCT approach exhibits significantly better performances on a variety of datasets in both English and Chinese. © 2015 Association for Computational Linguisticspublished_or_final_versio
Adapting Language Models for Non-Parallel Author-Stylized Rewriting
Given the recent progress in language modeling using Transformer-based neural
models and an active interest in generating stylized text, we present an
approach to leverage the generalization capabilities of a language model to
rewrite an input text in a target author's style. Our proposed approach adapts
a pre-trained language model to generate author-stylized text by fine-tuning on
the author-specific corpus using a denoising autoencoder (DAE) loss in a
cascaded encoder-decoder framework. Optimizing over DAE loss allows our model
to learn the nuances of an author's style without relying on parallel data,
which has been a severe limitation of the previous related works in this space.
To evaluate the efficacy of our approach, we propose a linguistically-motivated
framework to quantify stylistic alignment of the generated text to the target
author at lexical, syntactic and surface levels. The evaluation framework is
both interpretable as it leads to several insights about the model, and
self-contained as it does not rely on external classifiers, e.g. sentiment or
formality classifiers. Qualitative and quantitative assessment indicates that
the proposed approach rewrites the input text with better alignment to the
target style while preserving the original content better than state-of-the-art
baselines.Comment: Accepted for publication in Main Technical Track at AAAI 2
Distributed Representations for Compositional Semantics
The mathematical representation of semantics is a key issue for Natural
Language Processing (NLP). A lot of research has been devoted to finding ways
of representing the semantics of individual words in vector spaces.
Distributional approaches --- meaning distributed representations that exploit
co-occurrence statistics of large corpora --- have proved popular and
successful across a number of tasks. However, natural language usually comes in
structures beyond the word level, with meaning arising not only from the
individual words but also the structure they are contained in at the phrasal or
sentential level. Modelling the compositional process by which the meaning of
an utterance arises from the meaning of its parts is an equally fundamental
task of NLP.
This dissertation explores methods for learning distributed semantic
representations and models for composing these into representations for larger
linguistic units. Our underlying hypothesis is that neural models are a
suitable vehicle for learning semantically rich representations and that such
representations in turn are suitable vehicles for solving important tasks in
natural language processing. The contribution of this thesis is a thorough
evaluation of our hypothesis, as part of which we introduce several new
approaches to representation learning and compositional semantics, as well as
multiple state-of-the-art models which apply distributed semantic
representations to various tasks in NLP.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Cross-Domain Labeled LDA for Cross-Domain Text Classification
Cross-domain text classification aims at building a classifier for a target
domain which leverages data from both source and target domain. One promising
idea is to minimize the feature distribution differences of the two domains.
Most existing studies explicitly minimize such differences by an exact
alignment mechanism (aligning features by one-to-one feature alignment,
projection matrix etc.). Such exact alignment, however, will restrict models'
learning ability and will further impair models' performance on classification
tasks when the semantic distributions of different domains are very different.
To address this problem, we propose a novel group alignment which aligns the
semantics at group level. In addition, to help the model learn better semantic
groups and semantics within these groups, we also propose a partial supervision
for model's learning in source domain. To this end, we embed the group
alignment and a partial supervision into a cross-domain topic model, and
propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and
Reuters dataset, extensive quantitative (classification, perplexity etc.) and
qualitative (topic detection) experiments are conducted to show the
effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201
Decoding sentiment from distributed representations of sentences
Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language
- …