19 research outputs found
GIRNet: Interleaved Multi-Task Recurrent State Sequence Models
In several natural language tasks, labeled sequences are available in
separate domains (say, languages), but the goal is to label sequences with
mixed domain (such as code-switched text). Or, we may have available models for
labeling whole passages (say, with sentiments), which we would like to exploit
toward better position-specific label inference (say, target-dependent
sentiment annotation). A key characteristic shared across such tasks is that
different positions in a primary instance can benefit from different `experts'
trained from auxiliary data, but labeled primary instances are scarce, and
labeling the best expert for each position entails unacceptable cognitive
burden. We propose GITNet, a unified position-sensitive multi-task recurrent
neural network (RNN) architecture for such applications. Auxiliary and primary
tasks need not share training instances. Auxiliary RNNs are trained over
auxiliary instances. A primary instance is also submitted to each auxiliary
RNN, but their state sequences are gated and merged into a novel composite
state sequence tailored to the primary inference task. Our approach is in sharp
contrast to recent multi-task networks like the cross-stitch and sluice
network, which do not control state transfer at such fine granularity. We
demonstrate the superiority of GIRNet using three applications: sentiment
classification of code-switched passages, part-of-speech tagging of
code-switched text, and target position-sensitive annotation of sentiment in
monolingual passages. In all cases, we establish new state-of-the-art
performance beyond recent competitive baselines.Comment: Accepted at AAAI 201
Learning Sparse Sharing Architectures for Multiple Tasks
Most existing deep multi-task learning models are based on parameter sharing,
such as hard sharing, hierarchical sharing, and soft sharing. How choosing a
suitable sharing mechanism depends on the relations among the tasks, which is
not easy since it is difficult to understand the underlying shared factors
among these tasks. In this paper, we propose a novel parameter sharing
mechanism, named \emph{Sparse Sharing}. Given multiple tasks, our approach
automatically finds a sparse sharing structure. We start with an
over-parameterized base network, from which each task extracts a subnetwork.
The subnetworks of multiple tasks are partially overlapped and trained in
parallel. We show that both hard sharing and hierarchical sharing can be
formulated as particular instances of the sparse sharing framework. We conduct
extensive experiments on three sequence labeling tasks. Compared with
single-task models and three typical multi-task learning baselines, our
proposed approach achieves consistent improvement while requiring fewer
parameters.Comment: Accepted by AAAI 202
Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering
Answer selection and knowledge base question answering (KBQA) are two
important tasks of question answering (QA) systems. Existing methods solve
these two tasks separately, which requires large number of repetitive work and
neglects the rich correlation information between tasks. In this paper, we
tackle answer selection and KBQA tasks simultaneously via multi-task learning
(MTL), motivated by the following motivations. First, both answer selection and
KBQA can be regarded as a ranking problem, with one at text-level while the
other at knowledge-level. Second, these two tasks can benefit each other:
answer selection can incorporate the external knowledge from knowledge base
(KB), while KBQA can be improved by learning contextual information from answer
selection. To fulfill the goal of jointly learning these two tasks, we propose
a novel multi-task learning scheme that utilizes multi-view attention learned
from various perspectives to enable these tasks to interact with each other as
well as learn more comprehensive sentence representations. The experiments
conducted on several real-world datasets demonstrate the effectiveness of the
proposed method, and the performance of answer selection and KBQA is improved.
Also, the multi-view attention scheme is proved to be effective in assembling
attentive information from different representational perspectives.Comment: Accepted by AAAI 201