4 research outputs found

    Persian Text Classification using naive Bayes algorithms and Support Vector Machine algorithm

    Get PDF
    One of the several benefits of text classification is to automatically assign document in predefined category is one of the primary steps toward knowledge extraction from the raw textual data. In such tasks, words are dealt with as a set of features. Due to high dimensionality and sparseness of feature vector results from traditional feature selection methods, most of the proposed text classification methods for this purpose lack performance and accuracy. Many algorithms have been implemented to the problem of Automatic Text Categorization that’s why, we tried to use new methods like Information Extraction, Natural Language Processing, and Machine Learning. This paper proposes an innovative approach to improve the classification performance of the Persian text. Naive Bayes classifiers which are widely used for text classification in machine learning are based on the conditional probability. we have compared the Gaussian, Multinomial and Bernoulli methods of naive Bayes algorithms with SVM algorithm. for statistical text representation, TF and TF-IDF and character-level 3 (3-Gram) [6,9] were used. Finally, experimental results on 10 newsgroups

    Persian Text Classification Enhancement by Latent Semantic Space

    Get PDF
    Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity

    Arabic Text Classification Using Learning Vector Quantization

    Get PDF
    Text classification aims to automatically assign document in predefined category. In our research, we used a model of neural network which is called Learning Vector Quantization (LVQ) for classifying Arabic text. This model has not been addressed before in this area. The model based on Kohonen self organizing map (SOM) that is able to organize vast document collections according to textual similarities. Also, from past experiences, the model requires less training examples and much faster than other classification methods. In this research we first selected Arabic documents from different domains. Then, we selected suitable pre-processing methods such as term weighting schemes, and Arabic morphological analysis (stemming and light stemming), to prepare the data set for achieving the classification by using the selected algorithm. After that, we compared the results obtained from different LVQ improvement version (LVQ2.1, LVQ3, OLVQ1 and OLVQ3). Finally, we compared our work with other most known classification algorithms; decision tree (DT), K Nearest Neighbors (KNN) and Naïve Bayes. The results presented that the LVQ's algorithms especially LVQ2.1 algorithm achieved high accuracy and less time rather than others classification algorithms and other neural networks algorithms
    corecore