Search CORE

14 research outputs found

Macro-F1 results comparison of seven widely used deep learning models under seven combinations of preprocessing methods (TQ-tax question dataset; TC-THUCNews).

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

Macro-F1 results comparison of seven widely used deep learning models under seven combinations of preprocessing methods (TQ-tax question dataset; TC-THUCNews).</p

FigShare

The evaluation results based on seven deep learning models for two datasets.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

The evaluation results based on seven deep learning models for two datasets.</p

FigShare

THUCNews dataset.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of various additional experiments, and found that most of the classifiers improved in performance after proper preprocessing was applied. Our general conclusion is that the systematic use of preprocessing methods can have a positive impact on the classification of Chinese short text, using classification evaluation such as macro-F1, combination of preprocessing methods such as word segmentation, Chinese specific stop word and symbol removal, and classifier selection such as machine and deep learning models. We find that the best macro-f1s for categorizing text for the two datasets are 92.13% and 91.99%, which represent improvements of 0.3% and 2%, respectively over the compared baselines.</div

FigShare

The evaluation results based on four simple machine learning models for two datasets.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

The evaluation results based on four simple machine learning models for two datasets.</p

FigShare

Combinations of Chinese preprocessing methods.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

FigShare

Macro-F1 results comparison of four widely used pre-training learning models under seven combinations of preprocessing methods (TQ-tax question dataset; TC-THUCNews).

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

Macro-F1 results comparison of four widely used pre-training learning models under seven combinations of preprocessing methods (TQ-tax question dataset; TC-THUCNews).</p

FigShare

Comparison of the considered condition with previous research.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

Comparison of the considered condition with previous research.</p

FigShare

A case study of the proposed workflow in the field of taxation.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

A case study of the proposed workflow in the field of taxation.</p

FigShare

The workflow of the proposed approach.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

FigShare

Tax question dataset.

Author: Aziguli Wulamu (17137923)
Dezheng Zhang (7204568)
Jing Li (10611)
Yonghong Xie (705116)
Publication venue
Publication date: 12/10/2023
Field of study

FigShare