1,553 research outputs found
Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings
The recent success of neural machine translation models relies on the
availability of high quality, in-domain data. Domain adaptation is required
when domain-specific data is scarce or nonexistent. Previous unsupervised
domain adaptation strategies include training the model with in-domain copied
monolingual or back-translated data. However, these methods use generic
representations for text regardless of domain shift, which makes it infeasible
for translation models to control outputs conditional on a specific domain. In
this work, we propose an approach that adapts models with domain-aware feature
embeddings, which are learned via an auxiliary language modeling task. Our
approach allows the model to assign domain-specific representations to words
and output sentences in the desired domain. Our empirical results demonstrate
the effectiveness of the proposed strategy, achieving consistent improvements
in multiple experimental settings. In addition, we show that combining our
method with back translation can further improve the performance of the model.Comment: EMNLP 201
Zero-Resource Cross-Domain Named Entity Recognition
Existing models for cross-domain named entity recognition (NER) rely on
numerous unlabeled corpus or labeled NER training data in target domains.
However, collecting data for low-resource target domains is not only expensive
but also time-consuming. Hence, we propose a cross-domain NER model that does
not use any external resources. We first introduce a Multi-Task Learning (MTL)
by adding a new objective function to detect whether tokens are named entities
or not. We then introduce a framework called Mixture of Entity Experts (MoEE)
to improve the robustness for zero-resource domain adaptation. Finally,
experimental results show that our model outperforms strong unsupervised
cross-domain sequence labeling models, and the performance of our model is
close to that of the state-of-the-art model which leverages extensive
resources.Comment: RepL4NLP 202
Universal Semi-Supervised Semantic Segmentation
In recent years, the need for semantic segmentation has arisen across several
different applications and environments. However, the expense and redundancy of
annotation often limits the quantity of labels available for training in any
domain, while deployment is easier if a single model works well across domains.
In this paper, we pose the novel problem of universal semi-supervised semantic
segmentation and propose a solution framework, to meet the dual needs of lower
annotation and deployment costs. In contrast to counterpoints such as fine
tuning, joint training or unsupervised domain adaptation, universal
semi-supervised segmentation ensures that across all domains: (i) a single
model is deployed, (ii) unlabeled data is used, (iii) performance is improved,
(iv) only a few labels are needed and (v) label spaces may differ. To address
this, we minimize supervised as well as within and cross-domain unsupervised
losses, introducing a novel feature alignment objective based on pixel-aware
entropy regularization for the latter. We demonstrate quantitative advantages
over other approaches on several combinations of segmentation datasets across
different geographies (Germany, England, India) and environments (outdoors,
indoors), as well as qualitative insights on the aligned representations.Comment: Accepted as poster presentation at ICCV 201
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Sentiment analysis as a field has come a long way since it was first
introduced as a task nearly 20 years ago. It has widespread commercial
applications in various domains like marketing, risk management, market
research, and politics, to name a few. Given its saturation in specific
subtasks -- such as sentiment polarity classification -- and datasets, there is
an underlying perception that this field has reached its maturity. In this
article, we discuss this perception by pointing out the shortcomings and
under-explored, yet key aspects of this field that are necessary to attain true
sentiment understanding. We analyze the significant leaps responsible for its
current relevance. Further, we attempt to chart a possible course for this
field that covers many overlooked and unanswered questions.Comment: Published in the IEEE Transactions on Affective Computing (TAFFC
Cross-language Learning with Adversarial Neural Networks: Application to Community Question Answering
We address the problem of cross-language adaptation for question-question
similarity reranking in community question answering, with the objective to
port a system trained on one input language to another input language given
labeled training data for the first language and only unlabeled data for the
second language. In particular, we propose to use adversarial training of
neural networks to learn high-level features that are discriminative for the
main learning task, and at the same time are invariant across the input
languages. The evaluation results show sizable improvements for our
cross-language adversarial neural network (CLANN) model over a strong
non-adversarial system.Comment: CoNLL-2017: The SIGNLL Conference on Computational Natural Language
Learning; cross-language adversarial neural network (CLANN) model;
adversarial training; cross-language adaptation; community question
answering; question-question similarit
Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution
Most existing approaches for zero pronoun resolution are heavily relying on
annotated data, which is often released by shared task organizers. Therefore,
the lack of annotated data becomes a major obstacle in the progress of zero
pronoun resolution task. Also, it is expensive to spend manpower on labeling
the data for better performance. To alleviate the problem above, in this paper,
we propose a simple but novel approach to automatically generate large-scale
pseudo training data for zero pronoun resolution. Furthermore, we successfully
transfer the cloze-style reading comprehension neural network model into zero
pronoun resolution task and propose a two-step training mechanism to overcome
the gap between the pseudo training data and the real one. Experimental results
show that the proposed approach significantly outperforms the state-of-the-art
systems with an absolute improvements of 3.1% F-score on OntoNotes 5.0 data.Comment: 8+2 pages, published as a conference paper at ACL2017 (long paper
Investigating the Working of Text Classifiers
Text classification is one of the most widely studied tasks in natural
language processing. Motivated by the principle of compositionality, large
multilayer neural network models have been employed for this task in an attempt
to effectively utilize the constituent expressions. Almost all of the reported
work train large networks using discriminative approaches, which come with a
caveat of no proper capacity control, as they tend to latch on to any signal
that may not generalize. Using various recent state-of-the-art approaches for
text classification, we explore whether these models actually learn to compose
the meaning of the sentences or still just focus on some keywords or lexicons
for classifying the document. To test our hypothesis, we carefully construct
datasets where the training and test splits have no direct overlap of such
lexicons, but overall language structure would be similar. We study various
text classifiers and observe that there is a big performance drop on these
datasets. Finally, we show that even simple models with our proposed
regularization techniques, which disincentivize focusing on key lexicons, can
substantially improve classification accuracy.Comment: Proceedings of COLING 2018, the 27th International Conference on
Computational Linguistics: Technical Papers (COLING 2018), NIPS 2017 Workshop
on Deep Learning: Bridging Theory and Practic
Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation
In this work, we connect two distinct concepts for unsupervised domain
adaptation: feature distribution alignment between domains by utilizing the
task-specific decision boundary and the Wasserstein metric. Our proposed sliced
Wasserstein discrepancy (SWD) is designed to capture the natural notion of
dissimilarity between the outputs of task-specific classifiers. It provides a
geometrically meaningful guidance to detect target samples that are far from
the support of the source and enables efficient distribution alignment in an
end-to-end trainable fashion. In the experiments, we validate the effectiveness
and genericness of our method on digit and sign recognition, image
classification, semantic segmentation, and object detection.Comment: Accepted at CVPR 201
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Insufficient or even unavailable training data of emerging classes is a big
challenge of many classification tasks, including text classification.
Recognising text documents of classes that have never been seen in the learning
stage, so-called zero-shot text classification, is therefore difficult and only
limited previous works tackled this problem. In this paper, we propose a
two-phase framework together with data augmentation and feature augmentation to
solve this problem. Four kinds of semantic knowledge (word embeddings, class
descriptions, class hierarchy, and a general knowledge graph) are incorporated
into the proposed framework to deal with instances of unseen classes
effectively. Experimental results show that each and the combination of the two
phases achieve the best overall accuracy compared with baselines and recent
approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201
- …