Search CORE

19 research outputs found

Temporal Convolutional Attention-based Network For Sequence Modeling

Author: Hao Hongyan
Shen Furao
Wang Yan
Xia Yudi
Zhao Jian
Publication venue
Publication date: 04/03/2020
Field of study

With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 26.92 on word-level PTB, 1.043 on character-level PTB, and 6.66 on WikiText-2.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

Author: Haffari Gholamreza
Phung Dinh
Vu Thuy-Trang
Publication venue
Publication date: 01/01/2020
Field of study

Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \emph{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \emph{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.Comment: EMNLP202

arXiv.org e-Print Archive

Crossref

Monash University Research Portal