Deeper customer insight from NPS-questionnaires with text mining - Comparison of Machine, Representation and Deep Learning models in Finnish language sentiment classification

Abstract

The amount of data in the world has grown significantly during the last decade. It has been estimated that around 90 percent of all this data was generated in the last two years alone, and the pace of data creation is constantly accelerating. According to a popular belief around 80 percent of all the data in the world is in unstructured form. Unstructured data is heterogeneous and in difficult to analyze formats such as video, audio, image and text. This kind of data generally lack the tabular structure required by computers to effortlessly analyze the data. This thesis concentrates on the field of text mining, also known as text analytics. Text mining refers to techniques that can be used to extract information from textual unstructured data. Text mining is currently one of the most popular emerging analytics methods. It can be used in numerous ways to bring business value to companies from various fields. Text mining can be for instance used to gain deeper customer understanding by analysis of texts written by current and possible future customers. Deep customer understanding is a crucial part of the foundation on which successful consumer businesses are built on. This is one of the reasons Telia Finland is interested in the possibilities of text mining. In this thesis I will cover how text mining can be used to bring business value and how text mining can be used to gain deeper customer insight from NPS-questionnaires. Telia uses the popular Net Promoter Score (NPS) metric to assess customer satisfaction and loyalty. Telia Finland currently uses a SaaS application in the analysis of textual feedbacks connected to NPS questionnaires in order to gain more customer insight. Telia Finland is however interested in a more versatile, agile and better performing text mining solution than the current SaaS. This thesis aims to discover whether it would be possible for Telia to insource text analytics without notable performance compromises. In this thesis sentiment classification models are programmed with Python using all modern approaches to text mining, that is with machine, representation and deep learning approaches. The models use manually labelled Finnish language NPS feedback data from Telia Finland as the training and testing datasets. The classification performance of these models is evaluated by quantitative methods and compared to the performance of the currently used SaaS solution. All of the developed final sentiment classification models outperformed the current solution in overall sentiment classification of Finnish language NPS-feedbacks. When comparing the performance of the models, the model using the deep learning approach outperformed other approaches. The tuned deep learning model utilizing Long Short-Term Memory networks reached a classification accuracy and class-averaged precision of around 80 percent

    Similar works

    Full text

    thumbnail-image