    Customer Churn Prediction of Telecom Company Using Machine Learning Algorithms

    We can’t escape the fact that using telecommunications has become a significant part of our everyday lives. Since the Covid-19 pandemic, the telecommunication industry has become crucial.  Hence, the industry now enjoys growth opportunities. In this study, KNN, Random Forest (RF), AdaBoost, Logistic Regression (LR), XGBoost, and Support Vector Machine (SVM) are 6 supervised machine learning algorithms that will be used in this study to predict the customer churn of a telecom company in California. The goal of this study is to identify the classifier that predicts customer churn the most effectively. As evidenced by its accuracy of 79.67%, precision of 64.67%, recall of 51.87%, and F1-score of 57.57%, XGBoost is the overall most effective classifier in this study. Next, the purpose of this study is to identify the characteristics of customers who are most likely to leave the telecom company. These characteristics were discovered based on customers’ demographics and account information. Lastly, this study also provides the company with advice on how to retain customers. The study advises company to personalize the customer experience, implement a customer loyalty program, and apply AI in customer relationship management in retaining customers

    Un-factorize non-food NPS on a food-based retailer

    Dissertação de mestrado em Estatística para Ciência de DadosO Net Promoter Score (NPS) é uma métrica muito utilizada para medir o nível de lealdade dos consumidores. Neste sentido, esta dissertação pretende desenvolver um modelo de classificação que permita identificar a classe do NPS dos consumidores, ou seja, classificar o consumidor como Detrator, Passivo ou Promotor, assim como perceber os fatores que têm maior impacto nessa classificação. A informação recolhida permitirá à organização ter uma melhor percepção das áreas a melhorar de forma a elevar a satisfação do consumidor. Para tal, propõe-se uma abordagem de Data Mining para o problema de classificação multiclasse. A abordagem utiliza dados de um inquérito e dados transacionais do cartão de fidelização de um retalhista, que formam o conjunto de dados a partir dos quais se consegue obter informações sobre as pontuações do Net Promoter Score (NPS), o comportamento dos consumidores e informações das lojas. Inicialmente é feita uma análise exploratória dos dados extraídos. Uma vez que as classes são desbalanceadas, várias técnicas de reamostragem são aplicadas para equilibrar as mesmas. São aplicados dois algoritmos de classificação: Árvores de Decisão e Random Forests. Os resultados obtidos revelam um mau desempenho dos modelos. Uma análise de erro é feita ao último modelo, onde se conclui que este tem dificuldade em distinguir os Detratores e os Passivos, mas tem um bom desempenho a prever os Promotores. Numa ótica de negócio, esta metodologia pode ser utilizada para fazer uma distinção entre os Promotores e o resto dos consumidores, uma vez que os Promotores são a segmentação de clientes mais prováveis de beneficiar o mesmo a longo prazo, ajudando a promover a organização e atraíndo novos consumidores.More and more companies realise that understanding their customers can be a way to improve customer satisfaction and, consequently, customer loyalty, which in turn can result in an increase in sales. The NPS has been widely adopted by managers as a measure of customer loyalty and predictor of sales growth. In this regard, this dissertation aims to create a classification model focused not only in identi fying the customer’s NPS class, namely, classify the customer as Detractor, Passive or Promoter, but also in understanding which factors have the most impact on the customer’s classification. The goal in doing so is to collect relevant business insights as a way to identify areas that can help to improve customer satisfaction. We propose a Data Mining approach to the NPS multi-class classification problem. Our ap proach leverages survey data, as well as transactional data collected through a retailer’s loyalty card, building a data set from which we can extract information, such as NPS ratings, customer behaviour and store details. Initially, an exploratory analysis is done on the data. Several resam pling techniques are applied to the data set to handle class imbalance. Two different machine learning algorithms are applied: Decision Trees and Random Forests. The results did not show a good model’s performance. An error analysis was then performed in the later model, where it was concluded that the classifier has difficulty distinguishing the classes Detractors and Passives, but has a good performance when predicting the class Promoters. In a business sense, this methodology can be leveraged to distinguish the Promoters from the rest of the consumers, since the Promoters are more likely to provide good value in long term and can benefit the company by spreading the word for attracting new customers

    Customer churn prediction in telecom using machine learning and social network analysis in big data platform

    Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK


    The business world is becoming more competitive from time to time; therefore, businesses are forced to improve their strategies in every single aspect. So, determining the elements that contribute to the clients\u27 contentment is one of the critical needs of businesses to develop successful products in the market. The Kano model is one of the models that help determine which features must be included in a product or service to improve customer satisfaction. The model focuses on highlighting the most relevant attributes of a product or service along with customers’ estimation of how these attributes can be used to predict satisfaction with specific services or products. This research aims at developing a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction, with a specific focus on higher education. The significant contribution of this research is to improve the quality of United Arab Emirates University academic support and development services provided to their students by solving the problem of selecting features that are not methodically correlated to customer satisfaction, which could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Questionnaire data were collected from 646 students from United Arab Emirates University. The experiment suggests that Extreme Gradient Boosting Regression can produce the best results for this kind of problem. Based on the integration of the Kano model and the feature selection method, the number of features used to predict customer satisfaction is minimized to four features. It was found that either Chi-Square or Analysis of Variance (ANOVA) features selection model’s integration with the Kano model giving higher values of Pearson correlation coefficient and R2. Moreover, the prediction was made using union features between the Kano model\u27s most important features and the most frequent features among 8 clusters. It shows high-performance results

    Profitable Retail Customer Identification Based on a Combined Prediction Strategy of Customer Lifetime Value

    As a fundamental concept of customer relationship management, customer lifetime value (CLV) serves as a crucial metric to identify profitable retail customers. Various methods are available to predict CLV in different contexts. With the development of consumer big data, modern statistics and machine learning algorithms have been gradually adopted in CLV modeling. We introduce two machine learning algorithms—the gradient boosting decision tree (GBDT) and the random forest (RF)—in retail customer CLV modeling and compare their predictive performance with two classical models—the Pareto/NBD (HB) and the Pareto/GGG. To ensure CLV prediction and customer identification robustness, we combined the predictions of the four models to determine which customers are the most—or least—profitable. Using 43 weeks of customer transaction data from a large retailer in China, we predicted customer value in the future 20 weeks. The results show that the predictive performance of GBDT and RF is generally better than that of the Pareto/NBD (HB) and Pareto/GGG models. Because the predictions are not entirely consistent, we combine them to identify profitable and unprofitable customers

    Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention

    A novel paradigm in the service sector i.e. services through the web is a progressive mechanism for rendering offerings over diverse environments. Internet provides huge opportunities for companies to provide personalized online services to their customers. But prompt novel web services introduction may unfavorably affect the quality and user gratification. Subsequently, prediction of the consumer intention is of supreme importance in selecting the web services for an application. The aim of study is to predict online consumer repurchase intention and to achieve this objective a hybrid approach which a combination of machine learning techniques and Artificial Bee Colony (ABC) algorithm has been used. The study is divided into three phases. Initially, shopping mall and consumer characteristic’s for repurchase intention has been identified through extensive literature review. Secondly, ABC has been used to determine the feature selection of consumers’ characteristics and shopping malls’ attributes (with > 0.1 threshold value) for the prediction model. Finally, validation using K-fold cross has been employed to measure the best classification model robustness. The classification models viz., Decision Trees (C5.0), AdaBoost, Random Forest (RF), Support Vector Machine (SVM) and Neural Network (NN), are utilized for prediction of consumer purchase intention. Performance evaluation of identified models on training-testing partitions (70-30%) of the data set, shows that AdaBoost method outperforms other classification models with sensitivity and accuracy of 0.95 and 97.58% respectively, on testing data set. This study is a revolutionary attempt that considers both, shopping mall and consumer characteristics in examine the consumer purchase intention.N/

    Customer Churn Detection and Marketing Retention Strategies in the Online Food Delivery Business

    The purpose of this thesis is to analyze the behavior of customers within the Online Food Delivery industry, through which it is proposed to develop a prediction model that allows detecting, based on valuable active customers, those who will leave the services of Alpha Corporation in the near future. Firstly, valuable customers are defined as those consumers who have made at least 8 orders in the last 12 months. In this way, considering the historical behavior of said users, as well as applying Feature Engineering techniques, a first approach is proposed based on the implementation of a Random Forest algorithm and, later, a boosting algorithm: XGBoost. Once the performance of each of the models developed is analyzed, and potential churners are identified, different marketing suggestions are proposed in order to retain said customers. Retention strategies will be based on how Alpha Corporation works, as well as on the output of the predictive model. Other development alternatives will also be discussed: a clustering model based on potential churners or an unstructured data model to analyze the emotions of those users according to the NPS surveys. The aim of these proposals is to complement the prediction to design more specific retention marketing strategies

    Forecasting credit card attrition using machine learning models

    Este trabajo tiene como objetivo el estudio, aplicación e implementación de modelos Machine Learning para identificar qué clientes desean cancelar alguna de sus tarjetas de crédito. La industria bancaria utiliza esta tecnología con el fin de obtener predicciones más fiables a la hora de identificar oportunidades de compra, inversión o fraude. Estos modelos se pueden adaptar de forma independiente, por medio del reconocimiento de patrones y algoritmos basados en cálculos matemáticos. Para desarrollar la investigación se implementaron y evaluaron cuatro modelos (LightGBM, XGBoost, Random Forest y Logistic Regression) con el fin de predecir a través de los datos del cliente y sus productos la posibilidad de que cancele sus tarjetas de crédito. Mediante una análisis de la curvas ROC usando las métricas AUC, se llegó a la conclusión que de los modelos seleccionados, el modelo elegido para realizar la predicción fue LightGBM, ya que fue el que tuvo mejor desempeño en los experimentos realizados. De igual forma, se encontró que la variable Score Acierta, una calificación del cliente proveída por la central de riesgos, es la que más discrimina en los modelos predicción.The objective of this work is the implementation and evaluation of Machine Learning models to identify which customers want to cancel their credit cards. The banking industry uses this technology to obtain more reliable predictions when identifying opportunities for purchase, investment, or fraud. These models can be adapted independently, by recognizing patterns and algorithms based on mathematical calculations. Four models (LightGBM, XGBoost, Random Forest and Logistic Regression) were implemented and evaluated to predict, using data about customers and products held pertaining to a bank in Colombia, the likelihood of customers cancelling their credit cards. By analysing the ROC curves using the AUC metric, it is concluded that, of the selected models, the model chosen for deployment would be LightGBM, since it was the one that performed best in the experiments conducted. Furthermore, the ``Score Acierta'' variable, a customer rating provided by the Colombian credit rating agency, was found to be the most discriminating in prediction models

    Study about customer segmentation and application in a real case

    The hospitality industry generates a huge variety of data that grows by the day, becoming incrinsingly difficult to analyse this data manually in order to build a good data model. A thorough understanding of current customer profiles enables better resource allocation and leads to better definition of product and market development strategies. Dividing customers into similar groups to help develop more objective and focused marketing messages for each of the segments. Thus, in the present dissertation methods of classification and segmentation of existing data in the literature review are studied. Then, a real case study is presented, using data from Property Management Systems of eight Portuguese hotels, four city hotels and four resort hotels. This data set consists of fortyone attributes but, after selection of the most predictive variables, only a subset of attributes is used for data modeling. Next, the classification and segmentation methods studied in the literature review are applied for extracting the relevant information. The results are analyzed and discussed to understand their suitability to study the particular characteristics of hotel reservations.O setor de hospitalidade gera uma enorme variedade de dados que crescem a cada dia, tornando-se fisicamente impossível analisar esses dados manualmente a fim de construir um bom modelo de dados. Um profundo entendimento dos perfis dos atuais clientes permite uma melhor alocação de recursos e leva a uma melhor definição das estratégias de desenvolvimento de produtos e mercados. A divisão dos clientes em grupos semelhantes para ajudar a desenvolver mensagens de marketing mais objetivas e focadas para cada um dos seus segmentos. Desse modo na presente dissertação são estudados métodos de classificação e segmentação de dados existentes na revisão da literatura. De seguida, procede-se à apresentação de um estudo de um caso real, usando dados pertencentes a Sistemas de Gestão de Propriedade de oito hotéis portugueses, quatro hóteis de cidade e quatro hóteis de resort, este conjunto de dados é composto por quarenta e um atributos, mas, após uma selecção das variáveis com maior poder preditivo, apenas um subconjunto de atributos é utilizado para a modelação dos dados. Em seguida, são aplicados os métodos de classificação e segmentação estudados na revisão de literatura de modo a extrair informação relevante. Os resultados são analisados e discutidos para entender sua adequação ao estudo das características particulares das reservas de hotéis