    Analyzing the drivers of customer satisfaction via social media

    Social media became a great influence force during the last decade. Active social media user population increased with the new generations. Thus, data started to accumulate in tremendous amounts. Data accumulated through social media offers an opportunity to reach valuable insights and support business decisions. The aim of this project is to understand the drivers of customer satisfaction by public sentiments on Twitter towards a financial institution. Data was extracted from the most popular microblogging platform Twitter and sentiment analysis was performed. The unstructured data was classified by their sentiments with a lexicon-based model and a machine learning based model. The outcome of this study showed machine learning based model successfully overcame the language specific problems and was able to make better predictions where lexicon-based model struggled. Further analysis was performed on the extreme daily average sentiment scores to match these days with prominent events. The results showed that the public sentiment on Twitter is driven by three main themes; complaints related to services, advertisement campaigns, and influencers’ impact.Sosyal medyanın etki alanı geçtiğimiz yıllarla birlikte giderek artmıştır. Yeni jenerasyonlarla birlikte aktif olarak sosyal medya kullanan nüfus artış göstermiştir. Bu sebeple büyük veri birikimi artmıştır. Sosyal medya üzerinden oluşan büyük veri şirketlerin iş yapış şekillerine yönelik değerli kavrayış ve karar alma mekanizmalarına destek fırsatları sunmaktadır. Bu çalışmanın amacı bir finansal kurumun müşterilerinin memnuniyet seviyelerini sosyal medyada oluşan algıyı kullanarak anlamaya çalışmaktır. Çalışma kapsamında kullanılan veri popüler mikro-blog sitesi Twitter üzerinden derlenmiştir. Yapılandırılmamış bu veri sözlük tabanlı ve makine öğrenmesi tabanlı iki model kullanılarak analiz edilmiştir. Çalışma sonucu makine öğrenmesi tabanlı modelin sözlük tabanlı modelin karşılaştığı Türkçe kaynaklı sorunlardan daha az etkilendiği ve daha başarılı tahminler üretebildiğini göstermiştir. Analizin sonraki aşamasında ortalama sonucu aşırı uçlarda çıkan günler aynı günlerde ortaya çıkan olaylar ile eşleştirilmiştir. Ortaya çıkan sonuçlara göre müşteri memnuniyeti sosyal medyada ortaya çıkan üç temel faktörden etkilenmektedir. Bunlar, şikâyet yönetimi, kampanya yönetimi ve sosyal medya fenomenlerinin etkisi olarak tanımlanmaktadır

    A study on text-score disagreement in online reviews

    In this paper, we focus on online reviews and employ artificial intelligence tools, taken from the cognitive computing field, to help understanding the relationships between the textual part of the review and the assigned numerical score. We move from the intuitions that 1) a set of textual reviews expressing different sentiments may feature the same score (and vice-versa); and 2) detecting and analyzing the mismatches between the review content and the actual score may benefit both service providers and consumers, by highlighting specific factors of satisfaction (and dissatisfaction) in texts. To prove the intuitions, we adopt sentiment analysis techniques and we concentrate on hotel reviews, to find polarity mismatches therein. In particular, we first train a text classifier with a set of annotated hotel reviews, taken from the Booking website. Then, we analyze a large dataset, with around 160k hotel reviews collected from Tripadvisor, with the aim of detecting a polarity mismatch, indicating if the textual content of the review is in line, or not, with the associated score. Using well established artificial intelligence techniques and analyzing in depth the reviews featuring a mismatch between the text polarity and the score, we find that -on a scale of five stars- those reviews ranked with middle scores include a mixture of positive and negative aspects. The approach proposed here, beside acting as a polarity detector, provides an effective selection of reviews -on an initial very large dataset- that may allow both consumers and providers to focus directly on the review subset featuring a text/score disagreement, which conveniently convey to the user a summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be published in the Journal of Cognitive Computation, available at Springer via http://dx.doi.org/10.1007/s12559-017-9496-

    Perceiving University Student's Opinions from Google App Reviews

    Google app market captures the school of thought of users from every corner of the globe via ratings and text reviews, in a multilinguistic arena. The potential information from the reviews cannot be extracted manually, due to its exponential growth. So, Sentiment analysis, by machine learning and deep learning algorithms employing NLP, explicitly uncovers and interprets the emotions. This study performs the sentiment classification of the app reviews and identifies the university student's behavior towards the app market via exploratory analysis. We applied machine learning algorithms using the TP, TF, and TF IDF text representation scheme and evaluated its performance on Bagging, an ensemble learning method. We used word embedding, Glove, on the deep learning paradigms. Our model was trained on Google app reviews and tested on Student's App Reviews(SAR). The various combinations of these algorithms were compared amongst each other using F score and accuracy and inferences were highlighted graphically. SVM, amongst other classifiers, gave fruitful accuracy(93.41%), F score(89%) on bigram and TF IDF scheme. Bagging enhanced the performance of LR and NB with accuracy of 87.88% and 86.69% and F score of 86% and 78% respectively. Overall, LSTM on Glove embedding recorded the highest accuracy(95.2%) and F score(88%).Comment: Accepted in Concurrency and Computation Practice and Experienc

    Analiza sentimenta sirijskog sukoba na Twitteru

    Social media have become an important means of imposing ideas and interests in social‏ conflicts. The Syrian conflict is analysed using sentiment analysis of tweets in order to establish how the‏ sentiment shapes the modern political landscape and influences recipient knowledge. The importance of‏ social networks and their potential in overthrowing regimes as well as in radicalization are highlighted.‏ The authors suggest several stages that can be used for analysing tweets and how they impact the reader‏ with selected narration. Sentiment analysis is used on a trained data set as a way to gain insight into‏ tweets of different factions in the Syria conflict. Selected tweets on missile strikes were published on 14‏ April 2018 and the day after. The Twitter profiles of three different sides – pro-Assad, pro-West and anti-‏ Assad – were also analysed. The results show that there is a real battle on social media with the purpose‏ of influencing human emotions.‏Društveni mediji postali su bitna platforma za širenje ideja i interesa u društvenim sukobima. Za analizu sirijskog sukoba korištena je analiza sentimenta, kako bi se shvatilo na koji način sentiment oblikuje aktualno političko okruženje i utječe na znanje recipijenta. Rad ukazuje na važnost društvenih mreža, ali i na njihov potencijal u svrgavanju režima, kao i na sam proces radikalizacije. Autori predlažu nekoliko faza koje se mogu koristiti u analizi tweetova. Analiza sentimenta je korištena na treniranom skupu podataka kako bi se dobio uvid u tweetove različitih frakcija sirijskog sukoba. Odabrani tweetovi odnose se na raketni napadom 14. travnja 2018., kao i na dan nakon napada, 15. travnja 2018. Analizirani Twitter profili uključuju tri različite strane: pro-Assad, pro-Zapad i anti-Assad. Rezultati pokazuju da postoji stvarni sukob u društvenim medijima s ciljem utjecanja na ljudske emocije


    Entity linking is a task of extracting information that links the mentioned entity in a collection of text with their similar knowledge base as well as it is the task of allocating unique identity to various entities such as locations, individuals and companies. Knowledgebase (KB) is used to optimize the information collection, organization and for retrieval of information. Heterogeneous information networks (HIN) comprises multiple-type interlinked objects with various types of relationship which are becoming increasingly most popular named bibliographic networks, social media networks as well including the typical relational database data. In HIN, there are various data objects are interconnected through various relations. The entity linkage determines the corresponding entities from unstructured web text, in the existing HIN. This work is the most important and it is the most challenge because of ambiguity and existing limited knowledge. Some HIN could be considered as a domain-specific KB. The current Entity Linking (EL) systems aimed towards corpora which contain heterogeneous as web information and it performs sub-optimally on the domain-specific corpora. The EL systems used one or more general or specific domains of linking such as DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet and MKB. This paper presents a survey on domain-specific entity linking with HIN. This survey describes with a deep understanding of HIN, which includes datasets,types and examples with related concepts.Povezivanje entiteta je zadatak izvlačenja podataka koji povezuju spomenuti entitet u zbirci teksta sa njihovom sličnom bazom znanja, kao i zadatak dodjeljivanja jedinstvenog identiteta različitim entitetima, kao što su lokacije, pojedinci i tvrtke. Baza znanja (BZ) koristi se za optimizaciju prikupljanja, organizacije i pronalaženja informacija. Heterogene mreže informacija (HMI) obuhvaćaju višestruke međusobno povezane objekte različitih vrsta odnosa koji postaju sve popularniji i nazivaju se bibliografskim mrežama, mrežama društvenih medija, uključujući tipične podatke relacijske baze podataka. U HMI-u postoje razni podaci koji su međusobno povezani kroz različite odnose. Povezanost entiteta određuje odgovarajuće entitete iz nestrukturiranog teksta na webu u postojećem HMI-u. Ovaj je rad najvažniji i najveći izazov zbog nejasnoće i postojećeg ograničenog znanja. Neki se HMI mogu smatrati BZ-om specifičnim za domenu. Trenutni sustav povezivanja entiteta (PE) usmjeren je prema korpusima koji sadrže heterogene informacije kao web informacije i oni djeluju suptimalno na korpusima specifičnim za domenu. PE sustavi koristili su jednu ili više općih ili specifičnih domena povezivanja, kao što su DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet i MKB. U ovom radu predstavljeno je istraživanje o povezivanju entiteta specifičnog za domenu sa HMI-om. Ovo istraživanje opisuje s dubokim razumijevanjem HMI-a, što uključuje skupove podataka, vrste i primjere s povezanim konceptima

    What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

    Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective

    Role of sentiment classification in sentiment analysis: a survey

    Through a survey of literature, the role of sentiment classification in sentiment analysis has been reviewed. The review identifies the research challenges involved in tackling sentiment classification. A total of 68 articles during 2015 – 2017 have been reviewed on six dimensions viz., sentiment classification, feature extraction, cross-lingual sentiment classification, cross-domain sentiment classification, lexica and corpora creation and multi-label sentiment classification. This study discusses the prominence and effects of sentiment classification in sentiment evaluation and a lot of further research needs to be done for productive results

    Humor and offense speech classification and scoring using natural language processing

    Identifying humor and offense may prove to be an arduous task even for humans. It is, however, even more challenging to translate it into a logical process that a machine can understand. This work pretends to develop machine learning models which will be implemented to achieve this task. On this track, this study will be based on the SemEval 2021 workshop, where the participants were challenged to identify and score both humor and offense texts, as well as detect controversial sentences (SemEval 2021 - Task 7 - Detecting and Rating Humor and Offense), encouraging the use of current state-of-the-art algorithmic techniques in Natural Language Processing. The objective is to identify and propose the most optimal setup to achieve the highest performance on Humor Detection and related tasks using a common dataset aggregating eight thousand sentences classified with their respective binary humor indicator and humor rating, along with binary controversial indicators and offense rating values. This document presents a solution for the presented tasks based on BERT (Bidirectional Encoder Representations from Transformers) which makes use of Transformers interpreting the sentences in both directions (bidirectional), which brings a much higher context perception into the model. It will compare the performance of three different BERT variants (BERTBASE, DistillBERT, and RoBERTa), each of them designed for better fit on different tasks used by industry and academia. Concluding that DistillBERT presented the most accurate results in the Humor Detection and Humor Rating tasks, while RoBERTa performed best in the controversial detection task. Finally, BERTBASE outperformed in the Offensiveness Ranking task.A identificação do humor e ofensa pode revelar-se uma tarefa árdua mesmo para os humanos. No entanto, é ainda mais desafiante traduzi-lo num processo lógico que uma máquina possa compreender. Este trabalho pretende desenvolver modelos de aprendizagem automática que serão implementados para cumprir esta tarefa. Este estudo será baseado no workshop SemEval 2021, onde os participantes foram desafiados a detectar e classificar sentenças em relação ao humor e ofensividade, bem como detectar frases controversas (SemEval 2021 - Tarefa 7 - Detecção e Classificação de Humor e Ofensa), encorajando a utilização de estratégias algorítmicas de última geração focadas no processamento computacional da língua. O objectivo é identificar e propor a melhor configuração para alcançar o melhor desempenho na Detecção de Humor e tarefas relacionadas, utilizando um conjunto de dados comum que agrega oito mil sentenças classificadas com os respectivos identificadores binário de humor e classificação, juntamente com os identificadores binários de controversas e classificação de ofensas. Este documento apresenta uma solução para as tarefas apresentadas baseada no BERT (Bidirectional Encoder Representations from Transformers) que faz uso de Transformers, uma arquitetura de rede neuronais que permite interpretar as sentenças em ambos os sentidos (bidireccional), o que traz uma melhor percepção de contexto quando comparada com outras arquiteturas. Este estudo compara o desempenho de três variantes de BERT (BERTBASE, DistillBERT, and RoBERTa), cada uma delas concebida para se adaptar melhor às diferentes tarefas utilizadas pela indústria e pelo meio académico. Concluiu-se que DistillBERT apresentou o melhor desempenho nas tarefas de Detecção de Humor e Classificação de Humor, enquanto RoBERTa foi mais preciso na tarefa de detecção de frases controversas. Finalmente, BERTBASE obteve a melhor performance na tarefa de Classificação de Ofensividade