13 research outputs found
Performance evaluation of hybrid feature selection technique for sentiment classification based on food reviews
This paper presents an evaluation of the performance efficiency of sentiment classification using a hybrid feature selection technique. This technique is able to overcome the issue of lack in evaluating features importance by using a combination of TF-IDF+SVM-RFE (Term Frequency-Inverse Document Frequency (TF-IDF) and Supports Vector Machine (SVM-RFE)). Feature importance is measured and significant features are selected recursively based on the number of significant features known as k-top features. We tested this technique with a food reviews dataset from Kaggle to classify a positive and negative review. Finally, SVM has been deployed as a classifier to evaluate the classification performance. The performance is observed based on the accuracy, precision, recall and F-measure. The highest accuracy is 80%, precision is 82%, recall is 76% and F-measure is 79%. Consequently, 24.5% of the features to be classified in this technique have been reduced in obtaining these highest results. Thus, the computational resources are able to be utilized optimally from this reduction and the classification performance efficiency is able to be maintained
Benign and malignant detection and classification for small size image of breast tumor recognition system using u-net model
Breast tumor recognition is a critical task in the field of medical imaging systems, aiming to differentiate between benign and malignant tumors. To differentiate the tumors, an efficient technique is crucial to detect and classify it to avoid misdetection and misclassification, at the same time can accelerate the process. Thus, this paper proposed a deep learning technique which is a modified architecture of U-net model that based on Convolutional Neural Network (CNN) to detect and classify the tumors. The aim is to have a less complex U-Net model that is effective for a small size of images. During the technique deployment, data augmentation, transfer learning, and ensemble approach are employed. The proposed technique is tested using Breast Ultrasound Images dataset (BUSI) that is available in Kaggle. The results obtained are promising with accuracy of 0.8, precision of 0.88, recall of 0.7, and F1-score of 0.8. It indicates that this technique can contribute to the advancement of breast tumor detection and classification by providing valuable insights for clinicians in making accurate and timely diagnoses. Thus, the proposed technique has the potential to improve the efficiency and effectiveness of breast tumor recognition, aiding in the early detection and treatment of breast cancer
Identifying PTSD symptoms using machine learning techniques on social media
Post-traumatic stress disorder (PTSD) is a mental health illness brought on by watching or experiencing a horrific incident. Flashbacks, nightmares, acute anxiety, and uncontrolled thoughts about the unforgettable incident are the possible symptoms faced by PTSD sufferers. The PTSD diagnosis is usually done by a mental health specialist based on the symptoms that the person has, and the task is very time-consuming. Due to the widespread use of social media in recent years, it has opened up the opportunity to explore PTSD signs in users' postings on Twitter. The content-sharing feature available on this platform has allowed its users to share personal experiences, thoughts, and feelings that could reflect their psychological status. Thus, the goal of this work is to identify the PTSD symptom from text posting on Twitter. The crawled text posting is filtered and trained on selected machine learning and deep learning methods. The experiment results show that the support vector machine performed the best with 91% accuracy compared to others. This extracted model could be used in identifying PTSD symptoms on social media
Internet of Things (IoT) Based Fire Alert Monitoring System for Car Parking
Safety is one of the important factors that should be considered either in the parking area, workplace, home area and so forth. In the university parking area, the students are unable to receive any information regarding a fire smoke or an accident near their vehicle. In addition, the parking safety also not assured due to the shortage of car superintendence and there is no any strict parking management by the security officer. Therefore, a fire smoke alert monitoring system in the university parking area is necessary in order to prevent any accidents that may cause property breakdown and loss of life as happens inside the university area. This system should be introduced since the existing parking is unsystematic and less efficient as it unable to response the complications that are regularly happen to the students because they do not receive any information regarding a fire smoke or an accident near their vehicle in the parking area. With this new system, a few advancements are implemented in order to help the students in various aspects by using multiple and distinct Arduino devices. Moreover, an android application is developed to facilitate the security officer in order to identify the car information that are involved in the accident that might be occur in the university parking area
Customer churn classification in telecommunication company using rough set theory
Churn is perceived as the behaviour of a customer to leave or to terminate a service. This behaviour causes the loss of profit to companies because acquiring new customer incurred high investment for advertisements and promotions compared to retaining existing ones. Thus, it is necessary to consider an efficient classification model to reduce the rate of churn. In the traditional approach of classification modelling, it do not produce straightforward result interpretation. Therefore, identifying the best classification model to reduce the rate of churn is indeed a challenging task. The main objective of this thesis is to propose a new classification model based on the Rough Set Theory to classify customer churn. This research utilized the Knowledge Discovery in Database (KDD) process involving data pre-processing, data discretization, attribute reduction, rule generation, classification process, as well as data analysis, using the Rough Set toolkit. The Rough Set theory elements consist of indiscernibility relation, lower and upper approximations, as well as reduction set. Those elements are applied to classify customer chum from uncertain and imprecise dataset. The results of the proposed model are compared with a few established existing approaches. The results of the study show that the proposed classification model outperformed the existing models and contributes to significant accuracy improvement. The model is tested using dataset form local telecommunication company which achieves 90.32%. In conclusion, the results proved that the classification model based on Rough Set Theory had been capable to classify customer chum compared to the existing model
The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance
The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document
An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification
The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.</jats:p
The Impact of Pre-processing and Feature Selection on Text Classification
Nowadays text classification is dealing with unstructured and high-dimensionality text document. These textual data can be easily retrieved from social media platforms. However, this textual data is hard to manage and process for classification purposes. Pre-processing activities and feature selection are two methods to process the text documents. Therefore, this paper is presented to evaluate the effect of pre-processing and feature selection on the text classification performance. A tweet dataset is utilized and pre-processed using several combinations of pre-processing activities (tokenization, removing stop-words and stemming). Later, two feature selection techniques (Bag-of-Words and Term Frequency-Inverse Document Frequency) are applied on the pre-processed text. Finally, Support Vector Machine classifier is used to test the classification performances. The experimental results reveal that the combination of pre-processing technique and TF-IDF approach achieved greater classification performances compared to BoW approach. Better classification performances hit when the number of features is decreased. However, it is depending on the number of features obtained from the pre-processing activities and feature selection technique chosen
