research article review

Clinical Text Classification for Tuberculosis Diagnosis Using Natural Language Processing and Deep Learning Model with Statistical Feature Selection Technique

Abstract

In the medical field, various deep learning (DL) algorithms have been effec�tively used to extract valuable information from unstructured clinical text data, potentially leading to more effective outcomes. This study utilized clinical text data to classify clinical case reports into tuberculosis (TB) and non-tuberculosis (non-TB) groups using natural lan�guage processing (NLP), a pre-processing technique, and DL models. Methods: This study used 1743 open-source respiratory disease clinical text data, labeled via fuzzy matching with ICD-10 codes to create a labeled dataset. Two tokenization methods preprocessed the clinical text data, and three models were evaluated: the existing Text-CNN, the proposed Text-CNN with t-test, and Bio_ClinicalBERT. Performance was assessed using multiple metrics and validated on 228 baseline screening clinical case text data collected from ICMR–NIRT to demonstrate effective TB classification. Results: The proposed model achieved the best results in both the test and validation datasets. On the test dataset, it attained a precision of 88.19%, a recall of 90.71%, an F1-score of 89.44%, and an AUC of 0.91. Simi�larly, on the validation dataset, it achieved 100% precision, 98.85% recall, 99.42% F1-score, and an AUC of 0.982, demonstrating its effectiveness in TB classification. Conclusions: This study highlights the effectiveness of DL models in classifying TB cases from clinical notes. The proposed model outperformed the other two models. The TF-IDF and t-test showed statistically significant feature selection and enhanced model interpretability and efficiency, demonstrating the potential of NLP and DL in automating TB diagnosis in clinical decision settings

Similar works

This paper was published in NIRT Institutional Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.