130 research outputs found

    A comparative study of tree-based models for churn prediction : a case study in the telecommunication sector

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMIn the recent years the topic of customer churn gains an increasing importance, which is the phenomena of the customers abandoning the company to another in the future. Customer churn plays an important role especially in the more saturated industries like telecommunication industry. Since the existing customers are very valuable and the acquisition cost of new customers is very high nowadays. The companies want to know which of their customers and when are they going to churn to another provider, so that measures can be taken to retain the customers who are at risk of churning. Such measures could be in the form of incentives to the churners, but the downside is the wrong classification of a churners will cost the company a lot, especially when incentives are given to some non-churner customers. The common challenge to predict customer churn will be how to pre-process the data and which algorithm to choose, especially when the dataset is heterogeneous which is very common for telecommunication companies’ datasets. The presented thesis aims at predicting customer churn for telecommunication sector using different decision tree algorithms and its ensemble models

    Advances and applications in Ensemble Learning

    Get PDF

    Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner

    Get PDF
    In the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4% f1-score and 60.62% recall

    Machine learning techniques in churn rate analysis

    Get PDF
    RESUMEN: Este trabajo tiene como objetivo ofrecer una visión simplificada de la importancia del estudio del Churn Rate por parte de las empresas. A su vez, se explican diferentes técnicas de Data Mining que sirven para medir esta tasa y aportar información de cómo reducirla. La motivación de este análisis viene dada por la necesidad de conocer cuáles son las técnicas más utilizadas en su medición y a la vez averiguar cuáles son más efectivas en diferentes escenarios. Se comienza definiendo el concepto de Churn Rate y su importancia para después continuar con la definición de Data Mining. Mas adelante se explican cuatro métodos que sirven para calcular esta tasa aportando ejemplos prácticos de su efectividad. Todo esto se apoyará en una revisión bibliográfica de diferentes estudios relacionados con este tema. El objetivo de esta revisión es descubrir cuales son los métodos más utilizados en el análisis del Churn Rate en los últimos cinco años. Por otro lado, también se busca encontrar cuales son los más eficaces para este cálculo. Como resultado de este análisis se puede concluir que las técnicas de Data Mining más utilizadas en los últimos años son el Support Vector Machine y las Redes Neuronales Artificiales. Estas dos técnicas son las más relacionadas con la inteligencia artificial ya que se busca crear modelo de aprendizaje automatizado para obtener mejores resultados. Las Regresiones y los Árboles de decisión son técnicas menos usadas en el campo objeto de estudio de este trabajo pero que ofrecen unos resultados más precisos, al menos a corto plazo, quizás debidos a su mayor sencillez de aplicación. El tamaño de la muestra utilizada para el análisis también es importante ya que a mayor tamaño menor precisión, pero más posibilidades de desarrollar un modelo de aprendizaje automatizado que de mejores resultados a largo plazo.ABSTRACT: This work aims to provide a simplified view of the importance of the study of the Churn Rate by companies. At the same time, different techniques of Data Mining are explained that serve to measure this rate and to contribute information of how to reduce it. The motivation of this analysis is given by the need to know which are the most used techniques in their measurement and at the same time find out which are more effective in different scenarios. It begins by defining the concept of Churn Rate and its importance and then continue with the definition of Data Mining. Later on, four methods are explained that serve to calculate this rate providing practical examples of its effectiveness. All this will be supported by a bibliographic review of different studies related to this topic. The objective of this review is to discover which are the most used methods in the analysis of the Churn Rate in the last five years. On the other hand, it also seeks to find which are the most effective for this calculation. As a result of this analysis it can be concluded that the most used Data Mining techniques in recent years are the Support Vector Machine and Artificial Neural Networks. These two techniques are the most related to artificial intelligence as it seeks to create automated learning model for better results. Regressions and Decision Trees are less used techniques in this field, but they offer more precise results, at least in the short term, perhaps due to their simplicity of application. The size of the sample used for analysis is also important because the larger the sample, the lower the accuracy, but the more likely it is to develop an automated learning model that will yield better long-term results.Máster en Empresa y Tecnologías de la Informació

    Adaptive algorithms for real-world transactional data mining.

    Get PDF
    The accurate identification of the right customer to target with the right product at the right time, through the right channel, to satisfy the customer’s evolving needs, is a key performance driver and enhancer for businesses. Data mining is an analytic process designed to explore usually large amounts of data (typically business or market related) in search of consistent patterns and/or systematic relationships between variables for the purpose of generating explanatory/predictive data models from the detected patterns. It provides an effective and established mechanism for accurate identification and classification of customers. Data models derived from the data mining process can aid in effectively recognizing the status and preference of customers - individually and as a group. Such data models can be incorporated into the business market segmentation, customer targeting and channelling decisions with the goal of maximizing the total customer lifetime profit. However, due to costs, privacy and/or data protection reasons, the customer data available for data mining is often restricted to verified and validated data,(in most cases,only the business owned transactional data is available). Transactional data is a valuable resource for generating such data models. Transactional data can be electronically collected and readily made available for data mining in large quantity at minimum extra cost. Transactional data is however, inherently sparse and skewed. These inherent characteristics of transactional data give rise to the poor performance of data models built using customer data based on transactional data. Data models for identifying, describing, and classifying customers, constructed using evolving transactional data thus need to effectively handle the inherent sparseness and skewness of evolving transactional data in order to be efficient and accurate. Using real-world transactional data, this thesis presents the findings and results from the investigation of data mining algorithms for analysing, describing, identifying and classifying customers with evolving needs. In particular, methods for handling the issues of scalability, uncertainty and adaptation whilst mining evolving transactional data are analysed and presented. A novel application of a new framework for integrating transactional data binning and classification techniques is presented alongside an effective prototype selection algorithm for efficient transactional data model building. A new change mining architecture for monitoring, detecting and visualizing the change in customer behaviour using transactional data is proposed and discussed as an effective means for analysing and understanding the change in customer buying behaviour over time. Finally, the challenging problem of discerning between the change in the customer profile (which may necessitate the effective change of the customer’s label) and the change in performance of the model(s) (which may necessitate changing or adapting the model(s)) is introduced and discussed by way of a novel flexible and efficient architecture for classifier model adaptation and customer profiles class relabeling

    Antecedents of ESG-Related Corporate Misconduct: Theoretical Considerations and Machine Learning Applications

    Get PDF
    The core objective of this cumulative dissertation is to generate new insights in the occurrence and prediction of unethical firm behavior disclosure. The first two papers investigate predictors and antecedents of (severe) unethical firm behavior disclosure. The third paper addresses frequently occurring methodological issues when applying machine learning approaches within marketing research. Hence, the three papers of this dissertation contribute to two recent topics within the field of marketing: First, marketing research has already focused intensively on the consequences of corporate misconduct and the accompanying media coverage. Meanwhile, the prediction and the process of occurrence of such threatening events have been examined only sporadically so far. Second, companies and researchers are increasingly implementing machine learning as a methodology to solve marketing-specific tasks. In this context, the users of machine learning methods often face methodological challenges, for which this dissertation reviews possible solutions. Specifically, in study 1, machine learning algorithms are used to predict the future occurrence of severe threatening news coverage of corporate misconduct. Study 2 identifies relationships between the specific competitive situation of a company within its industry and unethical firm behavior disclosure. Study 3 addresses machine learning-based issues for marketing researchers and presents possible solutions by reviewing the computer science literature
    corecore