580 research outputs found

    Early Churn Prediction from Large Scale User-Product Interaction Time Series

    Full text link
    User churn, characterized by customers ending their relationship with a business, has profound economic consequences across various Business-to-Customer scenarios. For numerous system-to-user actions, such as promotional discounts and retention campaigns, predicting potential churners stands as a primary objective. In volatile sectors like fantasy sports, unpredictable factors such as international sports events can influence even regular spending habits. Consequently, while transaction history and user-product interaction are valuable in predicting churn, they demand deep domain knowledge and intricate feature engineering. Additionally, feature development for churn prediction systems can be resource-intensive, particularly in production settings serving 200m+ users, where inference pipelines largely focus on feature engineering. This paper conducts an exhaustive study on predicting user churn using historical data. We aim to create a model forecasting customer churn likelihood, facilitating businesses in comprehending attrition trends and formulating effective retention plans. Our approach treats churn prediction as multivariate time series classification, demonstrating that combining user activity and deep neural networks yields remarkable results for churn prediction in complex business-to-customer contexts.Comment: 12 pages, 3 tables, 8 figures, Accepted in ICML

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Table2Vec-automated universal representation learning of enterprise data DNA for benchmarkable and explainable enterprise data science.

    Full text link
    Enterprise data typically involves multiple heterogeneous data sources and external data that respectively record business activities, transactions, customer demographics, status, behaviors, interactions and communications with the enterprise, and the consumption and feedback of its products, services, production, marketing, operations, and management, etc. They involve enterprise DNA associated with domain-oriented transactions and master data, informational and operational metadata, and relevant external data. A critical challenge in enterprise data science is to enable an effective 'whole-of-enterprise' data understanding and data-driven discovery and decision-making on all-round enterprise DNA. Accordingly, here we introduce a neural encoder Table2Vec for automated universal representation learning of entities such as customers from all-round enterprise DNA with automated data characteristics analysis and data quality augmentation. The learned universal representations serve as representative and benchmarkable enterprise data genomes (similar to biological genomes and DNA in organisms) and can be used for enterprise-wide and domain-specific learning tasks. Table2Vec integrates automated universal representation learning on low-quality enterprise data and downstream learning tasks. Such automated universal enterprise representation and learning cannot be addressed by existing enterprise data warehouses (EDWs), business intelligence and corporate analytics systems, where 'enterprise big tables' are constructed with reporting and analytics conducted by specific analysts on respective domain subjects and goals. It addresses critical limitations and gaps of existing representation learning, enterprise analytics and cloud analytics, which are analytical subject, task and data-specific, creating analytical silos in an enterprise. We illustrate Table2Vec in characterizing all-round customer data DNA in an enterprise on complex heterogeneous multi-relational big tables to build universal customer vector representations. The learned universal representation of each customer is all-round, representative and benchmarkable to support both enterprise-wide and domain-specific learning goals and tasks in enterprise data science. Table2Vec significantly outperforms the existing shallow, boosting and deep learning methods typically used for enterprise analytics. We further discuss the research opportunities, directions and applications of automated universal enterprise representation and learning and the learned enterprise data DNA for automated, all-purpose, whole-of-enterprise and ethical machine learning and data science

    A comparative study of tree-based models for churn prediction : a case study in the telecommunication sector

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMIn the recent years the topic of customer churn gains an increasing importance, which is the phenomena of the customers abandoning the company to another in the future. Customer churn plays an important role especially in the more saturated industries like telecommunication industry. Since the existing customers are very valuable and the acquisition cost of new customers is very high nowadays. The companies want to know which of their customers and when are they going to churn to another provider, so that measures can be taken to retain the customers who are at risk of churning. Such measures could be in the form of incentives to the churners, but the downside is the wrong classification of a churners will cost the company a lot, especially when incentives are given to some non-churner customers. The common challenge to predict customer churn will be how to pre-process the data and which algorithm to choose, especially when the dataset is heterogeneous which is very common for telecommunication companies’ datasets. The presented thesis aims at predicting customer churn for telecommunication sector using different decision tree algorithms and its ensemble models

    Explaining Deep Learning Models for Tabular Data Using Layer-Wise Relevance Propagation

    Get PDF
    Trust and credibility in machine learning models are bolstered by the ability of a model to explain its decisions. While explainability of deep learning models is a well-known challenge, a further challenge is clarity of the explanation itself for relevant stakeholders of the model. Layer-wise Relevance Propagation (LRP), an established explainability technique developed for deep models in computer vision, provides intuitive human-readable heat maps of input images. We present the novel application of LRP with tabular datasets containing mixed data (categorical and numerical) using a deep neural network (1D-CNN), for Credit Card Fraud detection and Telecom Customer Churn prediction use cases. We show how LRP is more effective than traditional explainability concepts of Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) for explainability. This effectiveness is both local to a sample level and holistic over the whole testing set. We also discuss the significant computational time advantage of LRP (1–2 s) over LIME (22 s) and SHAP (108 s) on the same laptop, and thus its potential for real time application scenarios. In addition, our validation of LRP has highlighted features for enhancing model performance, thus opening up a new area of research of using XAI as an approach for feature subset selection

    INTEGRATING KANO MODEL WITH DATA MINING TECHNIQUES TO ENHANCE CUSTOMER SATISFACTION

    Get PDF
    The business world is becoming more competitive from time to time; therefore, businesses are forced to improve their strategies in every single aspect. So, determining the elements that contribute to the clients\u27 contentment is one of the critical needs of businesses to develop successful products in the market. The Kano model is one of the models that help determine which features must be included in a product or service to improve customer satisfaction. The model focuses on highlighting the most relevant attributes of a product or service along with customers’ estimation of how these attributes can be used to predict satisfaction with specific services or products. This research aims at developing a method to integrate the Kano model and data mining approaches to select relevant attributes that drive customer satisfaction, with a specific focus on higher education. The significant contribution of this research is to improve the quality of United Arab Emirates University academic support and development services provided to their students by solving the problem of selecting features that are not methodically correlated to customer satisfaction, which could reduce the risk of investing in features that could ultimately be irrelevant to enhancing customer satisfaction. Questionnaire data were collected from 646 students from United Arab Emirates University. The experiment suggests that Extreme Gradient Boosting Regression can produce the best results for this kind of problem. Based on the integration of the Kano model and the feature selection method, the number of features used to predict customer satisfaction is minimized to four features. It was found that either Chi-Square or Analysis of Variance (ANOVA) features selection model’s integration with the Kano model giving higher values of Pearson correlation coefficient and R2. Moreover, the prediction was made using union features between the Kano model\u27s most important features and the most frequent features among 8 clusters. It shows high-performance results

    Can bank interaction during rating measurement of micro and very small enterprises ipso facto Determine the collapse of PD status?

    Get PDF
    This paper begins with an analysis of trends - over the period 2012-2018 - for total bank loans, non-performing loans, and the number of active, working enterprises. A review survey was done on national data from Italy with a comparison developed on a local subset from the Sardinia Region. Empirical evidence appears to support the hypothesis of the paper: can the rating class assigned by banks - using current IRB and A-IRB systems - to micro and very small enterprises, whose ability to replace financial resources using endogenous means is structurally impaired, ipso facto orient the results of performance in the same terms of PD assigned by the algorithm, thereby upending the principle of cause and effect? The thesis is developed through mathematical modeling that demonstrates the interaction of the measurement tool (the rating algorithm applied by banks) on the collapse of the loan status (default, performing, or some intermediate point) of the assessed micro-entity. Emphasis is given, in conclusion, to the phenomenon using evidence of the intrinsically mutualistic link of the two populations of banks and (micro) enterprises provided by a system of differential equation

    Neuroverkkopohjaisten asiakaspoistumamallien selittäminen päätöksenteon tueksi

    Get PDF
    Organisaation asiakaskannan kokoon voidaan vaikuttaa joko uusasiakashankinnalla tai pyrkimällä vähentämään aktiivista asiakaspoistumaa, ja näistä kahdesta uuden asiakkaan hankinta on monta kertaa kalliimpaa. Asiakaspoistuman ennustamisessa neuroverkko on koneoppimismalli, joka pystyy sekä yleistämään hyvin oppimaansa että hyödyntämään suuria datamassoja. Ihmisen on kuitenkin usein mahdoton ymmärtää sen toimintaa. Organisaatiossa voi siten olla hankala luottaa neuroverkkopohjaisen asiakaspoistumamallin tuloksiin, vaikka luottamus on tiedon hyödynnettävyyden kannalta olennaista. Tässä diplomityössä tutkitaan, kuinka vakuutusalan asiakaspoistumaan liittyvää päätöksentekoa voidaan kehittää selittämällä neuroverkkopohjaisen asiakaspoistumamallin toimintaa yksinkertaisemman mallin avulla. Työhön liittyy niin asiakaspoistumaennusteen tarkentuminen ja asiakkaasta saatavan tiedon lisääntyminen neuroverkkoa selitettäessä kuin käytetyn mallin hyödyt verrattuna kohdeorganisaation aikaisempiin malleihin. Työn aluksi organisaation nykytilaan ja liiketoimintaympäristöön perehdytään haastattelemalla organisaation asiantuntijoita. Neuroverkkopohjaisen asiakaspoistumamallin toimintaa selitetään LIME- ja SHAP-menetelmillä niin paikallisesti kuin koko mallin laajuisestikin. Saatuja tuloksia verrataan sekä selittäjien välisesti että aikaisempaan tutkimustietoon ja organisaation olemassa oleviin analyyseihin tukeutuen. Työn aikana havainnoista keskustellaan myös kohdeorganisaation asiantuntijoiden kanssa. Selittäminen perustuu samaan dataan, jolla myös neuroverkko on opetettu, validoitu ja testattu. Käytetty data koostuu pääosin kohdeorganisaation omasta asiakasdatasta. Tutkimuksen tulosten perusteella neuroverkkopohjaista asiakaspoistumamallia selittämällä voidaan luoda luottamusta mallin toimintaa kohtaan. Lisäksi saadaan paljon eritasoista tietoa asiakkaiden käyttäytymisestä. Korkean tason tietoon sisältyy koko asiakaskantaa koskevat analyysit, joissa voidaan tutkia eri piirteiden merkityksiä asiakaspoistumassa ja niiden vaikutusten suuntia. Analyysia voidaan myös tarkentaa vertailemalla erilaisia asiakasryhmiä ja niiden käyttäytymistä keskenään. Tarkennusta voidaan tehdä jopa yksittäisen asiakkaan yksittäiseen piirteeseen asti. Mallia selittämällä voidaan siis tutkia, miten erilaisten piirteiden muutokset vaikuttavat asiakasryhmiin, ja tämän tiedon pohjalta voidaan esimerkiksi kohdentaa tai personoida markkinointia. Lisäksi on mahdollista tarkastella, miten organisaation erilaiset mittarit tai käytänteet vaikuttavat asiakaspoistumaan eri asiakasryhmillä. Tämän työn tulokset vastaavat yhteneväisiltä osiltaan hyvin aikaisempaa tutkimusta ja kohdeorganisaation analyyseja. Kohdeorganisaation aikaisempiin malleihin verrattuna käytetty ennustemalli on yksityiskohtaisempi ja monipuolisempi. Selittäjistä SHAP toimii käytetylle datalle paremmin kuin LIME, minkä katsotaan johtuvan piirteistä, joissa on vain vähän nollasta poikkeavia arvoja. Kaiken kaikkiaan tutkimus tuo vakuutusalan asiakaspoistuman tutkimuskenttään uutta tietoa, sillä syväoppivia neuroverkkoja hyödyntäviä asiakaspoistumamalleja ei ole juurikaan tutkittu, puhumattakaan koneoppimismallia selittämällä saatavasta tiedosta ja sen hyödyntämisestä liiketoiminnassa. Tutkimus herättää myös monia uusia kysymyksiä jatkotutkimusaiheiksi
    corecore