121 research outputs found

    Opinion mining with the SentWordNet lexical resource

    Get PDF
    Sentiment classification concerns the application of automatic methods for predicting the orientation of sentiment present on text documents. It is an important subject in opinion mining research, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is a lexical resource of sentiment information for terms in the English language designed to assist in opinion mining tasks, where each term is associated with numerical scores for positive and negative sentiment information. A resource that makes term level sentiment information readily available could be of use in building more effective sentiment classification methods. This research presents the results of an experiment that applied the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. First, a data set of relevant features extracted from text documents using SentiWordNet was designed and implemented. The resulting feature set is then used as input for training a support vector machine classifier for predicting the sentiment orientation of the underlying film review. Several scenarios exploring variations on the parameters that generate the data set, outlier removal and feature selection were executed. The results obtained are compared to other methods documented in the literature. It was found that they are in line with other experiments that propose similar approaches and use the same data set of film reviews, indicating SentiWordNet could become an important resource for the task of sentiment classification. Considerations on future improvements are also presented based on a detailed analysis of classification results

    A Computational Study of Speech Acts in Social Media

    Get PDF
    Speech acts are expressed by humans in daily communication that perform an action (e.g. requesting, suggesting, promising, apologizing). Modeling speech acts is important for improving natural language understanding (i.e. human-computer interaction through computers’ comprehension of human language) and developing other natural language processing (NLP) tasks such as question answering and machine translation. Analyzing speech acts on large scale using computational methods could benefit linguists and social scientists in getting insights into human language and behavior. Speech acts such as suggesting, questioning and irony have aroused great attention in previous NLP research. However, two common speech acts, complaining and bragging, have remained under explored. Complaints are used to express a mismatch between reality and expectations towards an entity or event. Previous research has only focused on binary complaint identification (i.e. whether a social media post contains a complaint or not) using traditional machine learning models with feature engineering. Bragging is one of the most common ways of self-presentation, which aims to create a favorable image by disclosing positive statements about speakers or their in-group. Previous studies on bragging have been limited to manual analyses of small data sets, e.g. fewer than 300 posts. The main aim of this thesis is to enrich the study of speech acts in computational linguistics. First, we introduce the task of classifying complaint severity levels and propose a method for injecting external linguistic information into novel pretrained neural language models (e.g. BERT). We show that incorporating linguistic features is beneficial to complaint severity classification. We also improve the performance of binary complaint prediction with the help of complaint severity information in multi-task learning settings (i.e. jointly model these two tasks). Second, we introduce the task of identifying bragging and classifying their types as well as a new annotated data set. We analyze linguistic patterns of bragging and their types and present error analysis to identify model limitations. Finally, we examine the relationship between online bragging and a range of common socio-demographic factors including gender, age, education, income and popularity

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining
    corecore