5 research outputs found
A Non-Sequential Representation of Sequential Data for Churn Prediction
We investigate the length of event sequence giving best predictions
when using a continuous HMM approach to churn prediction from sequential
data. Motivated by observations that predictions based on only the few most recent
events seem to be the most accurate, a non-sequential dataset is constructed
from customer event histories by averaging features of the last few events. A simple
K-nearest neighbor algorithm on this dataset is found to give significantly
improved performance. It is quite intuitive to think that most people will react
only to events in the fairly recent past. Events related to telecommunications occurring
months or years ago are unlikely to have a large impact on a customerâs
future behaviour, and these results bear this out. Methods that deal with sequential
data also tend to be much more complex than those dealing with simple nontemporal
data, giving an added benefit to expressing the recent information in a
non-sequential manner
Building Combined Classifiers
This chapter covers different approaches that may be taken when building an
ensemble method, through studying specific examples of each approach from research
conducted by the authors. A method called Negative Correlation Learning illustrates a
decision level combination approach with individual classifiers trained co-operatively. The
Model level combination paradigm is illustrated via a tree combination method. Finally,
another variant of the decision level paradigm, with individuals trained independently
instead of co-operatively, is discussed as applied to churn prediction in the
telecommunications industry
Data mining guided process for churn prediction in retail: from descriptive to predictive analytics
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementIn recent years, the development of new technologies has permeated all industries, and with
its rapid introduction, technology has brought the need to solve uncertainty in processes.
The need to understand and collect data by companies has become a central paradigm, but
the journey continues in the efforts to transform it into powerful insight into new processes,
goods, and services. In the grocery retail industry has been essential to understanding the
need to include academic research to understand different commercial purposes (Perloff &
Denbaly, 2007).
It has become an essential issue to understand the data coming from all the sources in the
industries, allowing to focus the efforts to reduce the gap between the vertical and horizontal
relationships and from the different stakeholders in the supply chain. That is why it became
relevant to understand the customer experience along the supply chain and maximized by the
marketing chain.
The complexity of the transactions and the crescent number of customers define challenges
for the grocery retail stores to process and provide a high-quality service based on data to
their customers. The key to gaining competitive advantage is to understand, classify, and
prevent customer churn to maximize profit. It is used to attract and retain new customers
with data-driven decisions. For this, it is necessary to understand and label the customers as
churners.
The organizations tend to focus more on developing plans to deal with the Customers,
using CRM (Customer Relationship Management) as the core strategy to handle, maintain
and build new long-lasting relationships with the customer as a critical stakeholder
(Chorianopoulos, 2015).
Data mining techniques help CRM to achieve their goals building tools that lead to informed
decisions, creating better, stronger and long-lasting relationships thanks to the analysis of
the customer-organization interaction and application of complex models
Building well-performing classifier ensembles: model and decision level combination.
There is a continuing drive for better, more robust generalisation performance from classification systems, and prediction systems in general. Ensemble methods, or the combining of multiple classifiers, have become an accepted and successful tool for doing this, though the reasons for success are not always entirely understood. In this thesis, we review the multiple classifier literature and consider the properties an ensemble of classifiers - or collection
of subsets - should have in order to be combined successfully. We find that the framework of Stochastic Discrimination provides a well-defined account of these properties, which are shown to be strongly encouraged in a number of the most popular/successful methods in the
literature via differing algorithmic devices. This uncovers some interesting and basic links between these methods, and aids understanding of their success and operation in terms of a kernel induced on the training data, with form particularly well suited to classification. One property that is desirable in both the SD framework and in a regression context, the ambiguity decomposition of the error, is de-correlation of individuals. This motivates
the introduction of the Negative Correlation Learning method, in which neural networks are trained in parallel in a way designed to encourage de-correlation of the individual networks. The training is controlled by a parameter λ governing the extent to which correlations are
penalised. Theoretical analysis of the dynamics of training results in an exact expression for the interval in which we can choose λ while ensuring stability of the training, and a value λâ for which the training has some interesting optimality properties. These values depend only on the size N of the ensemble. Decision level combination methods often result in a difficult to interpret model, and NCL is no exception. However in some applications, there is a need for understandable decisions and interpretable models. In response to this, we depart from the standard decision
level combination paradigm to introduce a number of model level combination methods. As decision trees are one of the most interpretable model structures used in classification, we chose to combine structure from multiple individual trees to build a single combined model. We show that extremely compact, well performing models can be built in this way. In particular, a generalisation of bottom-up pruning to a multiple-tree context produces good results in this regard. Finally, we develop a classification system for a real-world churn prediction problem, illustrating some of the concepts introduced in the thesis, and a number of more practical considerations which are of importance when developing a prediction system for a specific problem
Building well-performing classifier ensembles : model and decision level combination
There is a continuing drive for better, more robust generalisation performance from classification systems, and prediction systems in general. Ensemble methods, or the combining of multiple classifiers, have become an accepted and successful tool for doing this, though the reasons for success are not always entirely understood. In this thesis, we review the multiple classifier literature and consider the properties an ensemble of classifiers - or collection of subsets - should have in order to be combined successfully. We find that the framework of Stochastic Discrimination provides a well-defined account of these properties, which are shown to be strongly encouraged in a number of the most popular/successful methods in the literature via differing algorithmic devices. This uncovers some interesting and basic links between these methods, and aids understanding of their success and operation in terms of a kernel induced on the training data, with form particularly well suited to classification. One property that is desirable in both the SD framework and in a regression context, the ambiguity decomposition of the error, is de-correlation of individuals. This motivates the introduction of the Negative Correlation Learning method, in which neural networks are trained in parallel in a way designed to encourage de-correlation of the individual networks. The training is controlled by a parameter λ governing the extent to which correlations are penalised. Theoretical analysis of the dynamics of training results in an exact expression for the interval in which we can choose λ while ensuring stability of the training, and a value λâ for which the training has some interesting optimality properties. These values depend only on the size N of the ensemble. Decision level combination methods often result in a difficult to interpret model, and NCL is no exception. However in some applications, there is a need for understandable decisions and interpretable models. In response to this, we depart from the standard decision level combination paradigm to introduce a number of model level combination methods. As decision trees are one of the most interpretable model structures used in classification, we chose to combine structure from multiple individual trees to build a single combined model. We show that extremely compact, well performing models can be built in this way. In particular, a generalisation of bottom-up pruning to a multiple-tree context produces good results in this regard. Finally, we develop a classification system for a real-world churn prediction problem, illustrating some of the concepts introduced in the thesis, and a number of more practical considerations which are of importance when developing a prediction system for a specific problem.EThOS - Electronic Theses Online ServiceGBUnited Kingdo