AGH University of Krakow, Faculty of Computer Science
Doi
Abstract
In natural language processing with text data, which forms the basis of the studies in the field of Artificial Intelligence, various studies such as semantics and natural language generation are carried out, especially the solution of classification problems. This study aims to analyze the effect of detected named entities on text classification performance to make the text preprocessing stage more effective. In order to reduce the analysis time and increase the performance, after the classical preprocessing stage, word filtering was performed with Named Entity Recognition according to the thresholds determined in the 5% and 10% ranges. Analysis was performed with various machine learning, deep learning algorithms, Bidirectional Encoder Representations from Transformers (BERT) and the obtained results are discussed in the last part of the study. In the problem of classifying 50,000 news texts, 93% with Support Vector Machine (SVM) algorithm in statistical classification with machine learning, 87% with Long shortterm memory (LSTM), and 83% with BERT success was achieved. In the analyses performed with LSTM and BERT, although the model performances were numerically lower, it was observed that the semantic integrity was stronger in text classification and that the success increased after Named Entity Recognition (NER) filtering in general. Thus, it can be interpreted that the dataset that is passed through the NER filter according to the threshold values positivelyaffects the model\u27s success in terms of time and performance
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.