Social media processing is a fundamental task in natural language processing
with numerous applications. As Vietnamese social media and information science
have grown rapidly, the necessity of information-based mining on Vietnamese
social media has become crucial. However, state-of-the-art research faces
several significant drawbacks, including imbalanced data and noisy data on
social media platforms. Imbalanced and noisy are two essential issues that need
to be addressed in Vietnamese social media texts. Graph Convolutional Networks
can address the problems of imbalanced and noisy data in text classification on
social media by taking advantage of the graph structure of the data. This study
presents a novel approach based on contextualized language model (PhoBERT) and
graph-based method (Graph Convolutional Networks). In particular, the proposed
approach, ViCGCN, jointly trained the power of Contextualized embeddings with
the ability of Graph Convolutional Networks, GCN, to capture more syntactic and
semantic dependencies to address those drawbacks. Extensive experiments on
various Vietnamese benchmark datasets were conducted to verify our approach.
The observation shows that applying GCN to BERTology models as the final layer
significantly improves performance. Moreover, the experiments demonstrate that
ViCGCN outperforms 13 powerful baseline models, including BERTology models,
fusion BERTology and GCN models, other baselines, and SOTA on three benchmark
social media datasets. Our proposed ViCGCN approach demonstrates a significant
improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized
Language Models, including multilingual and monolingual, on three benchmark
datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our
integrated model ViCGCN achieves the best performance compared to other
BERTology integrated with GCN models