358 research outputs found

    Automatically Quantifying Customer Need Tweets: Towards a Supervised Machine Learning Approach

    Get PDF
    The elicitation of customer needs is an important task for businesses in order to design customer-centric products and services. While there are different approaches available, most lack automation, scalability and monitoring capabilities. In this work, we demonstrate the feasibility to automatically identify and quantify customer needs by training and evaluating on previously-labeled Twitter data. To achieve that, we utilize a supervised machine learning approach. Our results show that the classification performances are statistically superior-”but can be further improved in the future

    Mining and analysing social network in the oil business: Twitter sentiment analysis and prediction approaches

    Get PDF
    Twitter is a rich source of data for opinion mining and sentiment analysis that companies can use to improve their strategy with the public and stakeholders. However, extracting and analysing information from unstructured text remains a hard task. The aim of this research is to investigate the use of Twitter by “controversial” companies and other users. In particular, it looks at the nature of positive and negative sentiment towards oil companies and shows how this relates to cultural effects and the network structure. This has required the evaluation of existing automated methods for sentiment analysis and the development of improved methods based on user classification. The research showed that tweets about oil companies were noisy enough to affect the accuracy. In this thesis, we analysed data collected from Twitter and investigated the variance that arises from using an automated sentiment analysis tool versus crowd sourced human classification. Our particular interest lay in understanding how users’ motivation to post messages affected the accuracy of sentiment polarity. The dataset used Tweets originating from two of the world’s leading oil companies, BP America and Saudi Aramco, and other users that follow and mention them, representing Western and Middle Eastern countries respectively. Our results show that the two methods yield significantly different positive, natural and negative classifications depending on culture and the relationship of the poster of the tweet to the two companies. This motivated the investigation of the relationship between sentiment and user groups extracted by applying machine learning classifiers. Finally, clustering based on similarities in the network structure was used to connect user groups, and a novel technique to improve the sentiment accuracy was proposed. The analytical technique used here provided structured and valuable information for oil companies and has applications to other controversial domains

    Stock market prediction using machine learning classifiers and social media, news

    Get PDF
    Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble

    Tax Complaints Classification on Twitter Using Text Mining

    Get PDF
    Twitter growth and utilization encourage the emergence of limitless textual information so that people can express their complaints easily This leads the Directorate General of Taxation uses twitter to deal with tax complaints faced by the community. However, the messages on twitter can contain any information, either the tax complaint or not. This will cause difficulties in handling complaints process. It is important to automatically identify so tax complaint handling can be done effectively and efficiently. Given these problems, it is necessary to do the twitter tax complaint classification with the support of text mining. There are several methods of classification such as Naïve Bayes classifiers, Support Vector Machine (SVM) and Decision Tree. This research aims to classify the tax complaint on twitter automatically by using text mining. The experimental results show the value of f-measure of SVM, Naïve Bayes and Decision Tree, respectively, are 89.3%, 85.6% and 76.9

    Twitter user geolocation using web country noun searches

    Get PDF
    Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions

    Presence of Social Presence during Disasters

    Get PDF
    During emergencies, affected people use social media platforms for interaction and collaboration. Social media is used to ask for help, provide moral support, and to help each other, without direct face-to-face interactions. From a social presence point of view, we analyzed Twitter messages to understand how people cooperate and collaborate with each other during heavy rains and subsequent floods in Chennai, India. We conducted a manual content analysis to build social presence classifiers comprising intimacy and immediacy concepts which we used to train a machine learning approach to subsequently analyze the whole dataset of 1.65 million tweets. The results showed that the majority of the immediacy tweets are conveying the needs and urgencies of affected people requesting for help. We argue that during disasters, the online social presence creates a sense of responsibility and common identity among the social media users to participate in relief activities

    IDENTIFYING A CUSTOMER CENTERED APPROACH FOR URBAN PLANNING: DEFINING A FRAMEWORK AND EVALUATING POTENTIAL IN A LIVABILITY CONTEXT

    Get PDF
    In transportation planning, public engagement is an essential requirement forinformed decision-making. This is especially true for assessing abstract concepts such aslivability, where it is challenging to define objective measures and to obtain input that canbe used to gauge performance of communities. This dissertation focuses on advancing adata-driven decision-making approach for the transportation planning domain in thecontext of livability. First, a conceptual model for a customer-centric framework fortransportation planning is designed integrating insight from multiple disciplines (chapter1), then a data-mining approach to extracting features important for defining customersatisfaction in a livability context is described (chapter 2), and finally an appraisal of thepotential of social media review mining for enhancing understanding of livability measuresand increasing engagement in the planning process is undertaken (chapter 3). The resultsof this work also include a sentiment analysis and visualization package for interpreting anautomated user-defined translation of qualitative measures of livability. The packageevaluates users satisfaction of neighborhoods through social media and enhances thetraditional approaches to defining livability planning measures. This approach has thepotential to capitalize on residents interests in social media outlets and to increase publicengagement in the planning process by encouraging users to participate in onlineneighborhood satisfaction reporting. The results inform future work for deploying acomprehensive approach to planning that draws the marketing structure of transportationnetwork products with residential nodes as the center of the structure

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining
    corecore