68 research outputs found
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
Transformers are powerful for sequence modeling. Nearly all state-of-the-art
language models and pre-trained language models are based on the Transformer
architecture. However, it distinguishes sequential tokens only with the token
position index. We hypothesize that better contextual representations can be
generated from the Transformer with richer positional information. To verify
this, we propose a segment-aware Transformer (Segatron), by replacing the
original token position encoding with a combined position encoding of
paragraph, sentence, and token. We first introduce the segment-aware mechanism
to Transformer-XL, which is a popular Transformer-based language model with
memory extension and relative position encoding. We find that our method can
further improve the Transformer-XL base model and large model, achieving 17.1
perplexity on the WikiText-103 dataset. We further investigate the pre-training
masked language modeling task with Segatron. Experimental results show that
BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla
Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence
representation learning.Comment: Accepted by AAAI 202
A Comparison of SVM against Pre-trained Language Models (PLMs) for Text Classification Tasks
The emergence of pre-trained language models (PLMs) has shown great success
in many Natural Language Processing (NLP) tasks including text classification.
Due to the minimal to no feature engineering required when using these models,
PLMs are becoming the de facto choice for any NLP task. However, for
domain-specific corpora (e.g., financial, legal, and industrial), fine-tuning a
pre-trained model for a specific task has shown to provide a performance
improvement. In this paper, we compare the performance of four different PLMs
on three public domain-free datasets and a real-world dataset containing
domain-specific words, against a simple SVM linear classifier with TFIDF
vectorized text. The experimental results on the four datasets show that using
PLMs, even fine-tuned, do not provide significant gain over the linear SVM
classifier. Hence, we recommend that for text classification tasks, traditional
SVM along with careful feature engineering can pro-vide a cheaper and superior
performance than PLMs
- …