TCBERT: A Technical Report for Chinese Topic Classification BERT

Chen, Xinyu; Fan, Yuchen; Gan, Ruyi; Gao, Xinyu; Han, Ting; Pan, Kunhao; Song, Dingjie; Zhang, Jiaxing

TCBERT: A Technical Report for Chinese Topic Classification BERT

Authors: Xinyu Chen
Yuchen Fan
Ruyi Gan
Xinyu Gao
Ting Han
Kunhao Pan
Dingjie Song
Jiaxing Zhang
Publication date: 21 November 2022
Publisher

Abstract

Bidirectional Encoder Representations from Transformers or BERT~\cite{devlin-etal-2019-bert} has been one of the base models for various NLP tasks due to its remarkable performance. Variants customized for different languages and tasks are proposed to further improve the performance. In this work, we investigate supervised continued pre-training~\cite{gururangan-etal-2020-dont} on BERT for Chinese topic classification task. Specifically, we incorporate prompt-based learning and contrastive learning into the pre-training. To adapt to the task of Chinese topic classification, we collect around 2.1M Chinese data spanning various topics. The pre-trained Chinese Topic Classification BERTs (TCBERTs) with different parameter sizes are open-sourced at \url{https://huggingface.co/IDEA-CCNL}

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.11304

Last time updated on 24/12/2022