Search CORE

7 research outputs found

Feature Selection Technique for Text Document Classification: An Alternative Approach

Author: S.W. Mohod, Dr. C.A.Dhote
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2014
Field of study

Text classification and feature selection plays an important role for correctly identifying the documents into particular category, due to the explosive growth of the textual information from the electronic digital documents as well as world wide web. In the text mining present challenge is to select important or relevant feature from large and vast amount of features in the data set. The aim of this paper is to improve the feature selection method for text document classification in machine learning. In machine learning the training set is generated for testing the documents. This can be achieved by selecting important new term i.e. weights of term in text document to improve both classification with relevance to accuracy and performance

International Journal on Recent and Innovation Trends in Computing and Communication

A Novel Approach in Feature Selection Method for Text Document Classification

Author: S.W. Mohod, Dr. C.A. Dhote
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2015
Field of study

In this paper, a novel approach is proposed for extract eminence features for classifier. Instead of traditional feature selection techniques used for text document classification. We introduce a new model based on probability and over all class frequency of term. We applied this new technique to extract features from training text documents to generate training set for machine learning. Using these machine learning training set to automatic classify documents into corresponding class labels and improve the classification accuracy. The results on these proposed feature selection method illustrates that the proposed method performs much better than traditional methods. DOI: 10.17762/ijritcc2321-8169.15075

International Journal on Recent and Innovation Trends in Computing and Communication

Отбор информативных геометрических признаков ядер клеток на люминесцентных изображениях раковых клеток

Author: M. M. Yatskou
P. D. Pavel D. Kryvasheyeu
V. V. Apanasovich
V. V. Skakun
Ya. U. Lisitsa
В. В. Апанасович
В. В. Скакун
Е. В. Лисица
Н. Н. Яцков
П. Д. Кривошеев
Publication venue: UIIP NASB
Publication date: 06/12/2018
Field of study

The methods of geometric informative features selection of nuclei on fluorescent images of cancer cells are considered. During the survey, a review of existing geometric features was carried out, including both the signs of rotation resisted shape and displacement of the image, as well as signs of location in space. For the selection of characteristics, the methods were used: median, correlation with calculation of the Pearson correlation coefficient, correlation with calculation of the Spearman correlation coefficient, logistic regression model, random forest with CART trees and Gini criterion, random forest with CART trees and error minimization criterion. As a result of the investigation 11 characteristics were selected from 59 features, the quality of classification and time costs were calculated depending on the number of features for describing the objects. The use of 11 features is sufficient for the accuracy of classification as it allows to reduce time costs in 2,3 times.Рассмотрены методы отбора информативных признаков для выделения геометрических признаков при описании ядер на люминесцентных изображениях раковых клеток. Выполнен обзор существующих геометрических признаков, который включает в себя как признаки формы, устойчивые к повороту и перемещению изображения, так и признаки расположения в пространстве. Для отбора наиболее информативных признаков использованы шесть методов: медианный, корреляционный с расчетом коэффициента корреляции по Пирсону, корреляционный с расчетом коэффициента корреляции по Спирмену, метод логистической регрессии, случайного леса с CART-деревьями и критерием Gini, случайного леса с CART-деревьями и критерием минимизации ошибки. В результате исследования из 59 признаков отобраны 11 наиболее информативных, выполнен анализ качества классификации с помощью метода случайного леса и рассчитаны временные затраты в зависимости от количества признаков для описания объектов. Для метода случайного леса использование 11 признаков является достаточным по точности классификации и позволяет снизить временные затраты в 2,3 раза

Informatics (E-Journal) / Информатика

Sentiment Classification of Online Customer Reviews and Blogs Using Sentence-level Lexical Based Semantic Orientation Method

Author: KHAN AURANGZEB
Publication venue
Publication date: 01/01/2011
Field of study

ABSTRACT Sentiment analysis is the process of extracting knowledge from the peoples‟ opinions, appraisals and emotions toward entities, events and their attributes. These opinions greatly impact on customers to ease their choices regarding online shopping, choosing events, products and entities. With the rapid growth of online resources, a vast amount of new data in the form of customer reviews and opinions are being generated progressively. Hence, sentiment analysis methods are desirable for developing efficient and effective analyses and classification of customer reviews, blogs and comments. The main inspiration for this thesis is to develop high performance domain independent sentiment classification method. This study focuses on sentiment analysis at the sentence level using lexical based method for different type data such as reviews and blogs. The proposed method is based on general lexicons i.e. WordNet, SentiWordNet and user defined lexical dictionaries for sentiment orientation. The relations and glosses of these dictionaries provide solution to the domain portability problem. The experiments are performed on various data sets such as customer reviews and blogs comments. The results show that the proposed method with sentence contextual information is effective for sentiment classification. The proposed method performs better than word and text level corpus based machine learning methods for semantic orientation. The results highlight that the proposed method achieves an average accuracy of 86% at sentence-level and 97% at feedback level for customer reviews. Similarly, it achieves an average accuracy of 83% at sentence level and 86% at feedback level for blog comment

UTPedia

Sentiment Classification of Online Customer Reviews and Blogs Using Sentence-level Lexical Based Semantic Orientation Method

Author: KHAN AURANGZEB
Publication venue
Publication date: 01/01/2012
Field of study

UTPedia

SENTIMENT CLASSIFICATION OF ONLINE CUSTOMER REVIEWS AND BLOGS USING SENTENCE-LEVEL LEXICAL BASED SEMANTIC ORIENTATION METHOD

Author: KHAN AURANGZEB
Publication venue
Publication date: 01/01/2011
Field of study

Sentiment analysis is the process of extracting knowledge from the peoples’ opinions, appraisals and emotions toward entities, events and their attributes. These opinions greatly impact on customers to ease their choices regarding online shopping, choosing events, products and entities. With the rapid growth of online resources, a vast amount of new data in the form of customer reviews and opinions are being generated progressively. Hence, sentiment analysis methods are desirable for developing efficient and effective analyses and classification of customer reviews, blogs and comments. The main inspiration for this thesis is to develop high performance domain independent sentiment classification method. This study focuses on sentiment analysis at the sentence level using lexical based method for different type data such as reviews and blogs. The proposed method is based on general lexicons i.e. WordNet, SentiWordNet and user defined lexical dictionaries for sentiment orientation. The relations and glosses of these dictionaries provide solution to the domain portability problem. The experiments are performed on various datasets such as customer reviews and blogs comments. The results show that the proposed method with sentence contextual information is effective for sentiment classification. The proposed method performs better than word and text level corpus based machine learning methods for semantic orientation. The results highlight that the proposed method achieves an average accuracy of 86% at sentence-level and 97% at feedback level for customer reviews. Similarly, it achieves an average accuracy of 83% at sentence level and 86% at feedback level for blog comments

UTPedia