    Review of Data Mining Techniques for Churn Prediction in Telecom

    Telecommunication sector generates a huge amount of data due to increasing number of subscribers, rapidly renewable technologies; data based applications and other value added service. This data can be usefully mined for churn analysis and prediction. Significant research had been undertaken by researchers worldwide to understand the data mining practices that can be used for predicting customer churn. This paper provides a review of around 100 recent journal articles starting from year 2000 to present the various data mining techniques used in multiple customer based churn models. It then summarizes the existing telecom literature by highlighting the sample size used, churn variables employed and the findings of different DM techniques. Finally, we list the most popular techniques for churn prediction in telecom as decision trees, regression analysis and clustering, thereby providing a roadmap to new researchers to build upon novel churn management models

    A Hybrid Data Mining Method for Customer Churn Prediction

    The expenses for attracting new customers are much higher compared to the ones needed to maintain old customers due to the increasing competition and business saturation. So customer retention is one of the leading factors in companies’ marketing. Customer retention requires a churn management, and an effective management requires an exact and effective model for churn prediction. A variety of techniques and methodologies have been used for churn prediction, such as logistic regression, neural networks, genetic algorithm, decision tree etc.. In this article, a hybrid method is presented that predicts customers churn more accurately, using data fusion and feature extraction techniques. After data preparation and feature selection, two algorithms, LOLIMOT and C5.0, were trained with different size of features and performed on test data. Then the outputs of the individual classifiers were combined with weighted voting. The results of applying this method on real data of a telecommunication company proved the effectiveness of the method

    Utjecaj društvene mreže na odljev korisnika u mobilnim mrežama

    As the telecommunications sector has reached its mature stage, maintaining existing users has become crucial for service providers. Analyzing the call data records, it is possible to observe their users in the context of social network and obtain additional insights about the spread of influence among interconnected users, which is relevant to churn. In this paper, we examine the communication patterns of mobile phone users and subscription plan logs. Our goal is to use a simple model to predict which users are most likely to churn, solely by observing each user\u27s social network, which is formed by outgoing calls, and churn among their neighbors. To measure the importance of social network parameters with regard to churn prediction, we compare three models: spatial classification, regression model, and artificial neural networks. For each subscriber, we observe three social network parameters, the number of neighbors that have churned, the number of calls to these neighbors, and the duration of these calls for different time periods. The results indicate that using only one or two of these parameters yields results that are comparable or better than the complex models with large amounts of individual and/or social network input parameters that other researchers have proposed.Kako je telekomunikacijski sektor dosegao zreli stadij, zadržavanje postojećih korisnika od ključne je važnosti za pružatelje telekomunikacijskih usluga. Analizom liste poziva moguće je nadzirati korisnike u kontekstu društvene mreže i dobiti dodatni uvid u širenje utjecaja među povezanim korisnicima, što je relevantno za odljev korisnika. U ovom radu razmatramo obrasce komunikacije korisnika mobilnih mreža i podatke o planu pretplate. Naš cilj je korištenjem jednostavnog modela predvidjeti koji korisnici su najskloniji prijelazu na drugu mrežu, pritom koristeći samo korisnikovu društvenu mrežu koja se formira odlaznim pozivima i prijelazima između mreža njihovih susjeda. S ciljem mjerenja važnosti pojedinog parametra društvene mreže za predikciju prelaska na drugu mrežu uspoređena su tri modela: prostorna klasifikacija, regresijski model i model neuronske mreže. Za svakog pretplatnika razmatramo tri parametra društvene mreže: broj susjeda koji su promijenili mrežu, broj poziva prema njima kao i trajanje spomenutih poziva u različitim vremenskim razdobljima. Rezultati pokazuju kako se korištenjem samo jednog ili dva od navedenih parametara društvene mreže postižu rezultati koji su usporedivi ili bolji od rezultata složenijih modela drugih autora koji koriste veliki broj osobnih parametara i/ili parametara društvene mreže

    Churn prediction using customers' implicit behavioral patterns and deep learning

    The processes of market globalization are rapidly changing the competitive conditions of the business and financial sectors. With the emergence of new competitors and increasing investments in the banking services, an environment of closer customer relationships is the demand of today’s economics. In such a scenario, the concept of customer’s willingness to change the service provider – i.e. churn, has become a competitive domain for organizations to work on. In the banking sector, the task to retain the valuable customers has forced management to preemptively work on customers data and devise strategies to engage the customers and thereby reducing the churn rate. Valuable information can be extracted and implicit behavior patterns can be derived from the customers’ transaction and demographic data. Our prediction model, which is jointly using the time and location based sequence features has shown significant improvement in the customer churn prediction. Various supervised models had been developed in the past to predict churning customers; our model is using the features which are derived jointly from location and time stamped data. These sequenced based feature vectors are then used in the neural network for the churn prediction. In this study, we have found that time sequenced data used in a recurrent neural network based Long Short Term Memory (LSTM) model can predict with better precision and recall values when compared with baseline model. The feature vector output of our LSTM model combined with other demographic and computed behavioral features of customers gave better prediction results. We have also iv proposed and developed a model to find out whether connection between the customers can assist in the churn prediction using Graph convolutional networks (GCN); which incorporate customer network connections defined over three dimension

    Um modelo para previsão de churn na área do retalho

    Dissertação de mestrado em Engenharia de InformáticaO ambiente de grande competitividade característico do sector do retalho e crescente dificuldade na captação de novos clientes leva as empresas a apostar na implementação de estratégias adequadas para promover a satisfação dos clientes adquiridos para motivar a sua lealdade. É neste contexto que se começa a reconhecer a importância de combater o fenómeno de churn, ou seja, a perda de clientes. É necessário identificar os clientes que estão em risco de churn e, para isso, é necessário criar um método que o permita fazer com antecedência para que possam recair sobre eles as campanhas de retenção proactivas. Quanto mais eficaz for o método a identificar os clientes em riscos, maior será o retorno da aplicação da campanha. Muitos trabalhos têm sido desenvolvidos na área de previsão de churn nos mais diversos sectores. Contudo, na área do retalho a pesquisa têm sido muito limitada. Assim, com este trabalho de dissertação pretendeu-se estudar o fenómeno da perda de clientes com o objectivo de definir e implementar um modelo de churning para o sector do retalho recorrendo a técnicas de mineração de dados. Pretendeu-se fazer um levantamento das principais questões envolvidas na previsão de churn no retalho, na construção do conjunto de dados (assinaturas dos clientes) e na aplicação de técnicas de mineração de dados no processo de previsão. Nesse sentido, foram construídos alguns modelos para fazer a previsão de casos de churn baseados em cinco das técnicas de classificação mais utilizadas em trabalhos de previsão de churn: Árvores de Decisão, Regressão Logística, Redes Neuronais, Random Forests e SVM. A avaliação e comparação da performance dos modelos elaborados foi feita de acordo com várias medidas como accuracy, precision, sensitivity, specificity, f-measure e AUC e, para além disso, foi testado o impacto, na precisão do modelo, da alteração da densidade de eventos de churn no conjunto de treino.The great competitive environment characteristic of the retail sector and increasing difficulty in attracting new customers leads firms to invest in the implementation of appropriate strategies to promote customer satisfaction to motivate their loyalty. It is in this context that we begin to recognize the importance of combating the phenomenon of churn, i.e., the loss of clients. It is necessary to identify customers who are at risk of churn and, therefore, it is necessary to create a method that allows to do it in advance so that they can be covered by the proactive retention campaigns. The more effective the method to identify customers at risk, the higher the return of applying the campaign. Many studies have been developed in the area of churn prediction in various sectors. However, in the area of retail the research has been very limited. So with this dissertation work was intended to study the phenomenon of loss of customers to define and implement a model of churning to the retail sector using data mining techniques. The intention was to make a survey of the main issues involved in the prediction of churn in retail, construction of the dataset (customer signatures) and applying data mining techniques in the forecasting process. Accordingly, some models were constructed to forecast cases of churn based on five of the most commonly used classification techniques in churn prediction: Decision Trees, Logistic Regression, Neural Networks, Random Forests and SVM. The evaluation and comparison of the performance of models developed has been made according to several measures as accuracy, precision, sensitivity, specificity, f-measure and AUC and, furthermore, has been tested the impact of the change in the density of churn events in the training set

    Previsão de churn em companhias de seguros

    Dissertação de mestrado em Engenharia de InformáticaTransversal a qualquer indústria, a retenção de clientes é um aspeto de elevada importância e a que se deve dar toda a atenção possível. O abandono de um produto ou de um serviço por parte de um cliente, situação usualmente denominada por churn, é cada vez mais um indicador a ter em atenção por parte das empresas prestadoras de serviços. Juntamente com técnicas de Customer Relationship Management (CRM), a previsão de churn, oferece às empresas uma forte vantagem competitiva, uma vez que lhes permite obter melhores resultados na fidelização dos seus clientes. Com o constante crescimento e amadurecimento dos sistemas de informação, torna-se cada vez mais viável a utilização de técnicas de Data Mining, capazes de extrair padrões de comportamento que forneçam, entre outros, informação intrínseca nos dados, com sentido e viável no domínio do negócio em questão. O trabalho desta dissertação foca-se na utilização de técnicas de Data Mining para a previsão de situação de churn dos clientes no ramo das seguradoras, tendo como o objetivo principal a previsão de casos de churn e, assim, possibilitar informação suficiente para a tomada de ações que visem prever o abandono de clientes. Nesse sentido, foi desenvolvido nesta dissertação um conjunto de modelos preditivos de churn, estes modelos foram implementados utilizando diferentes técnicas de data mining. Com esta implementação de vários modelos, pretende-se realizar uma avaliação comparativa dos mesmos, de forma a analisar qual o mais eficaz na previsão de casos churn.Transversal to any industry, customer retention is a highly important aspect and that we should give all possible attention. The abandonment of a product or a service by a customer, a situation usually referred to as churn, is an indicator that the service provider company should take in attention. Along with techniques of Customer Relationship Management (CRM), the churn prediction offers to companies a strong competitive advantage since it allows them to get better results in customer retention. With the constant growth and maturity of information systems, it becomes more feasible to use data mining techniques, which can extract behavior patterns that provide intrinsic information hided in the data. This dissertation focuses on using data mining techniques for predicting customer churn situations in insurance companies, having as main objective the prediction of cases of churn and thereby allow information gathering that can be used to take actions to avoid the customer desertion. In this dissertation we develop a set of predictive churn models using different data mining techniques. We studied the following techniques: decision trees, neural networks, logistic regression and SVM. The implementation of various models using this set of techniques allowed us to conclude that the most suitable techniques to predict churn in an insurance company are decision trees and logistic regression, in addiction we did a study about the most relevant churn indicators

    Newcomer Retention and Productivity in Online Peer-Production Communities

    University of Minnesota Ph.D. dissertation. July 2018. Major: Computer Science. Advisor: Joseph Konstan. 1 computer file (PDF); x, 159 pages.Online communities are online interaction spaces for people that break the barriers of time, space, and scale and provide opportunities for companionship and social support, information exchange, retail, and entertainment. Among them are online peer production communities that have a fantastic business model where volunteers come together to produce content and drive traffic to these sites. Although as a class these communities are successful, the success of individual communities greatly varies. To become and remain successful, these communities must meet a number of challenges related to starting communities, retention of members, encouraging commitment, and contribution from their members, regulating the behavior of members and so on. This dissertation focuses on the specific challenge of newcomer retention and productivity in the context of online peer-production communities. Exploring three different communities with entirely different structures and compositions – MovieLens, GitHub, and Wikipedia and building upon prior work in this space, this dissertation offers a number of important predictors of retention and productivity of newcomers. First, this dissertation explores the value of early activity diversity in the presence of the amount of early activity as a predictor of newcomer retention. Second, this dissertation digs into more fundamental psychological traits of newcomers such as personality and presents findings on relationships between personality and newcomer retention, preferences, and productivity. Third, this dissertation explores and presents results on the relationship between community interactions (apart from norms, policies and rigid structures) and newcomer retention. Fourth, this dissertation studies and presents the effects of various kinds of prior experience of newcomers on retention and productivity in a new group they join. This dissertation concludes by offering a number of directions for future research

    Applying data mining in telecommunications

    This thesis applies data mining in commercial settings in the telecommunications industry. The research for this thesis has been performed at T-Mobile Netherlands B.V. and the methods described in some of the chapters have been also applied in Deutsche Telekom subsidiaries in other countries. We had a rare opportunity to work on real commercial data sets and have the results of our research deployed in practice. Throughout this thesis we describe some of the challenges that data miners (or data scientists) meet when working on business problems and our solutions to these problems. The complex data sets we were analyzing contained in certain cases millions of records. In this research we were using simple methods combined in innovative ways to achieve results that were either an improvement on how the business was previously solving these problems or solving important business problems that were not addressed before in such detail. We address the stages of CRISP-DM (CRoss Industry Standard Process for Data Mining), and our main focus is on the stages least covered in literature.T-Mobile Netherlands B.V.Algorithms and the Foundations of Software technolog