1 research outputs found
Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features
Text classification is an important field of research from mid 90s up to now.
It has many applications, one of them is in Web-based biosurveillance systems
which identify and summarize online disease outbreak reports. In this paper we
focus on classifying Vietnamese disease outbreak reports. We investigate
important properties of disease outbreak reports, e.g., sentences containing
names of outbreak disease, locations. Evaluation on 10-time 10- fold
cross-validation using the Support Vector Machine algorithm shows that using
sentences containing disease outbreak names with its preceding/following
sentences in combination with location features achieve the best F-score with
86.67% - an improvement of 0.38% in comparison to using all raw text. Our
results suggest that using important sentences and rich feature can improve
performance of Vietnamese disease outbreak text classification.Comment: 5 pages, 2 table