68 research outputs found

    A combination of multi-period training data and ensemble methods to improve churn classification of housing loan customers

    Full text link
    [EN] Customer retention has been the focus of customer relationship management in the financial sector during the past decade. The first and important step in customer retention is to classify the customers into possible churners, those likely to switch to another service provider, and non-churners. The second step is to take action to retain the most probable churners. The main challenge in churn classification is the rarity of churn events. In order to overcome this, two aspects are found to improve the churn classification model: the training data and the algorithm. The recently proposed multi-period training data approach is found to outperform the single period training data thanks to the more effective use of longitudinal data. Regarding the churn classification algorithms, the most advanced and widely employed is the ensemble method, which combines multiple models to produce a more powerful one. Two popularly used ensemble techniques, random forest and gradient boosting, are found to outperform logistic regression and decision tree in classifying churners from non-churners. The study uses data of housing loan customers from a Nordic bank. The key finding is that models combining the multi-period training data approach with ensemble methods performs the best.Seppälä, T.; Thuy, L. (2018). A combination of multi-period training data and ensemble methods to improve churn classification of housing loan customers. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 141-144. https://doi.org/10.4995/CARMA2018.2018.8334OCS14114

    A knowledge-intensive methodology for explainable sales prediction

    Get PDF
    Sales prediction in food market is a complex issue that has been addressed in the recent past with machine learning techniques. Although some promising results, an experimental work that we describe in this paper shows some drawbacks of the above mentioned data-driven method and habilitates the definition of a novel methodology, strongly involving a piori knowledg

    Teknik Weighting untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Churn Menggunakan XGBoost, LightGBM, dan CatBoost

    Get PDF
    Churn merupakan kondisi dimana seseorang berpindah dari satu layanan ke layanan yang lain. Churn pelanggan menjadi masalah yang meningkat cukup signifikan dan menjadi tantangan utama yang harus dihadapi banyak perusahaan perbankan karena memiki peran penting terhadap laba perusahaan.  Oleh sebab itu, diperlukan cara untuk memprediksi perilaku churn tepat waktu agar bisa menerapkan retensi pelanggan. Namun, Permasalahan yang dihadapi oleh model prediksi churn adalah ketidakseimbangan kelas sehingga membuat model klasifikasi menghasilkan kinerja yang buruk. Solusi yang paling sering digunakan untuk mengatasi masalah ketidakseimbangan kelas terbagi menjadi tiga pendekatan yaitu pendekatan level data, level algoritma dan  ensemble. Setiap pendekatan  mengalami beberapa masalah yang sulit diprediksi ketika digunakan untuk menangani masalah ketidakseimbangan kelas. Pada penelitian ini, peneliti melakukan eksperimen menggunakan metode ensemble berbasis boosting untuk melakukan prediksi churn pelanggan dan mencoba meningkatkan kinerjanya pada dataset yang tidak seimbang dengan parameter tuning menggunakan scale pos weight. Algoritma klasifikasi yang digunakan yaitu XGBoost (extreme gradient boosting), LightGBM (light gradient boosting machine) dan CatBoost. Hasil eksperimen akan membandingkan kinerja dari ketiga algoritma berbasis boosting tersebut dengan menyesuaikan bobot parameternya sebanyak tiga kali. Dari hasil pengujian, model CatBoost memperoleh nilai recall tertinggi sebesar 0.79. Sedangkan untuk nilai recall terendah adalah model CatBoost default dengan nilai 0.47. Bedasarkan hasil ekperimen dapat disimpulan bahwa model bekerja dengan cukup baik pada data yang tidak seimbang dengan memberikan mekanisme hyperparameter scale pos weightsehingga model dapat lebih fokus pada kelas minoritas yang sulit dideteksi. 

    Churn prediction based on text mining and CRM data analysis

    Get PDF
    Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. Filling this gap, we present a method for estimating churn probabilities directly from text data, by adopting classical text mining methods and combining them with state-of-the-art statistical prediction modelling. We transform every customer text document into a vector in a high-dimensional word space, after applying text mining pre-processing steps such as removal of stop words, stemming and word selection. The churn probability is then estimated by statistical modelling, using random forest models. We applied these methods to customer text data of a major Swiss telecommunication provider, with data originating from transcripts of phone calls between customers and call-centre agents. In addition to the analysis of the text data, a similar churn prediction was performed for the same customers, based on structured CRM data. This second approach serves as a benchmark for the text data churn prediction, and is performed by using random forest on the structured CRM data which contains more than 300 variables. Comparing the churn prediction based on text data to classical churn prediction based on structured CRM data, we found that the churn prediction based on text data performs as well as the prediction using structured CRM data. Furthermore we found that by combining both structured and text data, the prediction accuracy can be increased up to 10%. These results show clearly that text data contains valuable information and should be considered for churn estimation

    Churn Management Optimization via Partial Least Square (PLS) Model with Controllable Marketing Instruments and Associated Management Costs

    Get PDF
    In this paper, we use a partial least square (PLS) optimization method as a prediction model to estimate the churn probabilities of customers and as a control model after configuring optimization objective and constraints with relative management costs of controllable variables. In our experiment, we observe that while the training and test data sets are dramatically different in terms of churner distributions (50% vs. 1.8%), four controllable variables in three marketing strategies played a key role in optimization process. We also observe that the most significant variable for prediction does not necessarily play an important role in optimization model because of the highest management cost. In addition, we show that marketing managers even further maximize financial outcomes of marketing campaigns by selecting customers based on churn probability or management cost
    • …
    corecore