8 research outputs found

    Analysis of data pre-processing methods for the sentiment analysis of reviews

    Get PDF
    The aim of this study is to analyse the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods and their combinations are effective for English and an agglutinative language like Turkish. We also try to answer the research question “is there any difference between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis?” We find that the performance results for the English reviews are generally higher than for the Turkish reviews related to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language

    Using Data Mining Techniques for Detecting the Important Features of the Bank Direct Marketing Data

    Get PDF
    Collection of customer information is seen necessary for development of the marketing strategies. Developing technologies are used very effectively in bank marketing campaigns as in many field of life. Customer data is stored electronically and the size of this data is so immense that to analyse it manually with a team of human analysts is impossible. In this paper, data mining techniques are used to interpret and define the important features to increase the campaign's effectiveness, i.e. if the client subscribes the term deposit. The bank marketing dataset from the University of California at Irvine Machine Learning Repository has been used for the proposed paper. We consider two feature selection methods namely Information Gain and Chi-square methods to select the important features. The methods are compared using a supervised machine learning algorithm of Naive Bayes. The experimental results show that reduced set of features improves the classification performance. Keywords: Bank marketing, feature selection, machine learning methods, data mining, chi-square, information gain. JEL Classifications: C80, C50, Y10, M3

    Türkçe metinlerde duygu analizi için nitelik seçimi.

    No full text
    TEZ11538Tez (Doktora) -- Çukurova Üniversitesi, Adana, 2016.Kaynakça (s. 69-73) var.viii, 116 s. ; 29 cm.Duygu analizi yorum belgelerinde ifade edilen duygunun sınıflandırılmasıdır. Diğer bütün sınıflandırma işlevlerinde olduğu gibi, veri önişleme, nitelik seçimi ve metin sınıflandırma adımlarından oluşur. Bu çalışmanın amaçlarından biri, Türkçe yorumların duygu analizinde çeşitli veri önişleme kombinasyonlarının incelenmesi ve hangi nitelik seçimi yöntemleri ile etkin sonuçlar elde edildiğinin araştırılmasıdır. Bir diğer amacımız ise duygu analizindeki en değerli niteliklerin seçiminde yeni bir nitelik seçimi yönteminin önerilmesidir. Duygu analizi için kullanılan Ki-kare, Information Gain, Document Frequency Difference ve Optimal Orthogonal Centroid gibi nitelik seçimi yöntemleri ile duygu analizi sürecinin doğruluk ve verimlilik özelliklerini geliştirmek hedeflenmiştir. Ayrıca bu yöntemlerle önerdiğimiz yeni nitelik seçimi yöntemi karşılaştırılmıştır. Deneyler için yaygın olarak kullanılan dört sınıflandırıcı tercih edilmiştir: Naïve Bayes Multinomial, Destek Vektör Makineleri, Logistic Regression ve Karar Ağacı. Türkçe yorumların analizinde belirli noktalama işaretlerini ve etkisiz kelimeleri nitelik olarak tutmanın olumlu katkı sağladığını ve kullandığımız nitelik seçim yöntemleriyle de daha iyi sonuçlar elde etmeye katkı sağladığını gözlemledik. Ayrıca dört temel terim ağırlıklandırma yöntemlerinin duygu analizi üzerindeki etkileri incelenmiştir. Bu yöntemlerin farklı nitelik seçimi yöntemleri üzerindeki etkileri ve bu yöntemlerin azaltılmış nitelikler ile nasıl sonuç verdikleri incelenerek analiz edilmiştir. Sonuç olarak, Türkçe yorumlar üzerine uygulanan deneyler, İngilizce yorumlar üzerinde de uygulanmış ve farklılıklar incelenmiştir.Sentiment analysis is the classification of sentiments expressed in review documents. Like other classification tasks, it involves data preprocessing, feature selection, and classification steps. One aim of this study is to determine which preprocessing combinations and feature selection methods are effective for the sentiment analysis of Turkish reviews. Another aim is to propose a new feature selection method that helps identify the most valuable features for sentiment analysis. We consider several major feature selection methods, including Chi-square, Information Gain, Document Frequency Difference, and Optimal Orthogonal Centroid so that we can improve both the accuracy and efficiency of the sentiment analysis process and compare the performance of our new proposal. Experiments are conducted using four commonly used classifiers: Naïve Bayes Multinomial, Support Vector Machines, Logistic Regression, and Decision Trees. We find that keeping certain punctuation marks and stop words is helpful for Turkish reviews, and using feature selection methods of Chi-square, Information Gain, and Document Frequency Difference with Naïve Bayes Multinomial classifier tends to give us better results. Our proposed method achieves better classification performance with respect to the other methods. We further consider four common term weighting methods and investigate their effects on the sentiment analysis. We also try these weighting methods with different feature selection methods and examine how these term weighting methods respond to the reduced text representation. Finally, similar experiments are conducted on English reviews in order to compare their differences with Turkish reviews.Bu çalışma Ç.Ü. Bilimsel Araştırma Projeleri Birimi tarafından desteklenmiştir. Proje No: FDK-2015-3833

    Analysis of data pre-processing methods for sentiment analysis of reviews

    No full text
    The goals of this study are to analyze the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods (and their combinations) are effective for English as well as for an agglutinative language like Turkish. We also try to answer the research question of whether there are any differences between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis. We find that the performance results for the English reviews are generally higher than those for the Turkish reviews due to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language

    Comparison of Feature Selection Methods for Sentiment Analysis on Turkish Twitter Data

    No full text
    25th Signal Processing and Communications Applications Conference (SIU) -- MAY 15-18, 2017 -- Antalya, TURKEYWOS: 000413813100251The Internet and social media provide a major source of information about people's opinions. Due to the rapidly growing number of online documents, it becomes both time-consuming and hard task to obtain and analyze the desired opinionated information. Sentiment analysis is the classification of sentiments expressed in documents. To improve classification perfromance feature selection methods which help to identify the most valuable features are generally applied. In this paper, we compare the performance of four feature selection methods namely Chi-square, Information Gain, Query Expansion Ranking, and Ant Colony Optimization using Maximum Entropi Modeling classification algorithm over Turkish Twitter dataset. Therefore, the effects of feature selection methods over the performance of sentiment analysis of Turkish Twitter data are evaluated. Experimental results show that Query Expansion Ranking and Ant Colony Optimization methods outperform other traditional feature selection methods for sentiment analysis.Turk Telekom, Arcelik A S, Aselsan, ARGENIT, HAVELSAN, NETAS, Adresgezgini, IEEE Turkey Sect, AVCR Informat Technologies, Cisco, i2i Syst, Integrated Syst & Syst Design, ENOVAS, FiGES Engn, MS Spektral, Istanbul Teknik Uni

    Prediction of pathological complete response to neoadjuvant chemotherapy in locally advanced breast cancer by using a deep learning model with 18F-FDG PET/CT.

    No full text
    ObjectivesThe aim of the study is 18F-FDG PET/CT imaging by using deep learning method are predictive for pathological complete response pCR after Neoadjuvant chemotherapy (NAC) in locally advanced breast cancer (LABC).IntroductionNAC is the standard treatment for locally advanced breast cancer (LABC). Pathological complete response (pCR) after NAC is considered a good predictor of disease-free survival (DFS) and overall survival (OS).Therefore, there is a need to develop methods that can predict the pCR at the time of diagnosis.MethodsThis article was designed as a retrospective chart study.For the convolutional neural network model, a total of 355 PET/CT images of 31 patients were used. All patients had primary breast surgery after completing NAC.ResultsPathological complete response was obtained in a total of 9 patients. The study results show that our proposed deep convolutional neural networks model achieved a remarkable success with an accuracy of 84.79% to predict pathological complete response.ConclusionIt was concluded that deep learning methods can predict breast cancer treatment
    corecore