16 research outputs found

    Modeling Attrition in Organizations from Email Communication

    Full text link
    Abstract—Modeling people’s online behavior in relation to their real-world social context is an interesting and important research problem. In this paper, we present our preliminary study of attrition behavior in real-world organizations based on two online datasets: a dataset from a small startup (40+ users) and a dataset from one large US company (3600+ users). The small startup dataset is collected using our privacy-preserving data logging tool, which removes personal identifiable information from content data and extracts only aggregated statistics such as word frequency counts and sentiment features. The privacy-preserving measures have enabled us to recruit participants to support this study. Correlation analysis over the startup dataset has shown that statistically there is often a change point in people’s online behavior, and data exhibits weak trends that may be manifestation of real-world attrition. Same findings are also verified in the large company dataset. Furthermore, we have trained a classifier to predict real-world attrition with a moderate accuracy of 60-65 % on the large company dataset. Given the incompleteness and noisy nature of data, the accuracy is encouraging. I

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    The effect of friends’ churn on consumer behavior in mobile networks

    Get PDF
    We study how consumers decide which tariff plan to choose and whether to churn when their friends churn in the mobile industry. We develop a theoretical model showing conditions under which users remain with their carrier and conditions under which they churn when their friends do. We then use a large and rich anonymized longitudinal panel of call detailed records to characterize the consumers’ path to death with unprecedented level of detail. We explore the structure of the network inferred from these data to derive instruments for friends’ churn, which is typically endogenous in network settings. This allows us to econometrically identify the effect of peer influence in our setting. On average, we find that each additional friend that churns increases the monthly churn rate by 0.06 percent. The observed monthly churn rate across our dataset is 2.15 percent. We also find that firms introducing the pre-paid tariff plans that charge the same price to call users inside and outside the carrier help retain consumers that would otherwise churn. In our setting, without this tariff plan the monthly churn rate could have been as high as 8.09 percent. We perform a number of robustness checks, in particular to how we define friends in the social graph, and show that our results remain unchanged. Our paper shows that the traditional definition of customer lifetime value underestimates the value of consumers and, in particular, that of consumers with more friends due to the effect of contagious churn and, therefore, managers should actively take into account the structure of the social network when prioritizing whom to target during retention campaigns.info:eu-repo/semantics/acceptedVersio

    Data Mining for Anomaly Detection

    Get PDF
    The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments

    Intelligent data analysis approaches to churn as a business problem: a survey

    Get PDF
    Globalization processes and market deregulation policies are rapidly changing the competitive environments of many economic sectors. The appearance of new competitors and technologies leads to an increase in competition and, with it, a growing preoccupation among service-providing companies with creating stronger customer bonds. In this context, anticipating the customer’s intention to abandon the provider, a phenomenon known as churn, becomes a competitive advantage. Such anticipation can be the result of the correct application of information-based knowledge extraction in the form of business analytics. In particular, the use of intelligent data analysis, or data mining, for the analysis of market surveyed information can be of great assistance to churn management. In this paper, we provide a detailed survey of recent applications of business analytics to churn, with a focus on computational intelligence methods. This is preceded by an in-depth discussion of churn within the context of customer continuity management. The survey is structured according to the stages identified as basic for the building of the predictive models of churn, as well as according to the different types of predictive methods employed and the business areas of their application.Peer ReviewedPostprint (author's final draft

    Churn prediction models tested and evaluated in the Dutch indemnity industry

    Get PDF
    Due to global developments customer churn is getting a growing concern to the insurance industry. Technological improvements like the internet makes it much easier for customer to compare their policies, obtain new offers or even churn from one provider to another. The insurance industry therefore has become a heavily competitive market in which insurance companies have to compete to protect and expand their customer base in order to maintain or expand their market position. Thus, retaining customers is becoming more and more important and therefore finding customers who are most likely to leave is a central aspect. Many different techniques are available to identify customers who are most likely to leave, however which technique can be used best is often not clear. Research clarifies that the characteristics of the industry and/or dataset which is used are mostly assessing related to performance. In advance it is impossible to determine the best suited technique to use if previous research in which performance was tested has not been published. This study presents a data mining methodology in which the four most used prediction techniques in literature are tested and evaluated using a real life voluminous insurance company dataset to determine which technique performs best. Using the same dataset makes results comparable and clears out which technique performs best based on the insurance data domain characteristics

    Um modelo para previsão de churn na área do retalho

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaO ambiente de grande competitividade característico do sector do retalho e crescente dificuldade na captação de novos clientes leva as empresas a apostar na implementação de estratégias adequadas para promover a satisfação dos clientes adquiridos para motivar a sua lealdade. É neste contexto que se começa a reconhecer a importância de combater o fenómeno de churn, ou seja, a perda de clientes. É necessário identificar os clientes que estão em risco de churn e, para isso, é necessário criar um método que o permita fazer com antecedência para que possam recair sobre eles as campanhas de retenção proactivas. Quanto mais eficaz for o método a identificar os clientes em riscos, maior será o retorno da aplicação da campanha. Muitos trabalhos têm sido desenvolvidos na área de previsão de churn nos mais diversos sectores. Contudo, na área do retalho a pesquisa têm sido muito limitada. Assim, com este trabalho de dissertação pretendeu-se estudar o fenómeno da perda de clientes com o objectivo de definir e implementar um modelo de churning para o sector do retalho recorrendo a técnicas de mineração de dados. Pretendeu-se fazer um levantamento das principais questões envolvidas na previsão de churn no retalho, na construção do conjunto de dados (assinaturas dos clientes) e na aplicação de técnicas de mineração de dados no processo de previsão. Nesse sentido, foram construídos alguns modelos para fazer a previsão de casos de churn baseados em cinco das técnicas de classificação mais utilizadas em trabalhos de previsão de churn: Árvores de Decisão, Regressão Logística, Redes Neuronais, Random Forests e SVM. A avaliação e comparação da performance dos modelos elaborados foi feita de acordo com várias medidas como accuracy, precision, sensitivity, specificity, f-measure e AUC e, para além disso, foi testado o impacto, na precisão do modelo, da alteração da densidade de eventos de churn no conjunto de treino.The great competitive environment characteristic of the retail sector and increasing difficulty in attracting new customers leads firms to invest in the implementation of appropriate strategies to promote customer satisfaction to motivate their loyalty. It is in this context that we begin to recognize the importance of combating the phenomenon of churn, i.e., the loss of clients. It is necessary to identify customers who are at risk of churn and, therefore, it is necessary to create a method that allows to do it in advance so that they can be covered by the proactive retention campaigns. The more effective the method to identify customers at risk, the higher the return of applying the campaign. Many studies have been developed in the area of churn prediction in various sectors. However, in the area of retail the research has been very limited. So with this dissertation work was intended to study the phenomenon of loss of customers to define and implement a model of churning to the retail sector using data mining techniques. The intention was to make a survey of the main issues involved in the prediction of churn in retail, construction of the dataset (customer signatures) and applying data mining techniques in the forecasting process. Accordingly, some models were constructed to forecast cases of churn based on five of the most commonly used classification techniques in churn prediction: Decision Trees, Logistic Regression, Neural Networks, Random Forests and SVM. The evaluation and comparison of the performance of models developed has been made according to several measures as accuracy, precision, sensitivity, specificity, f-measure and AUC and, furthermore, has been tested the impact of the change in the density of churn events in the training set
    corecore