21 research outputs found
Machine learning approach for segmenting glands in colon histology images using local intensity and texture features
Colon Cancer is one of the most common types of cancer. The treatment is
planned to depend on the grade or stage of cancer. One of the preconditions for
grading of colon cancer is to segment the glandular structures of tissues.
Manual segmentation method is very time-consuming, and it leads to life risk
for the patients. The principal objective of this project is to assist the
pathologist to accurate detection of colon cancer. In this paper, the authors
have proposed an algorithm for an automatic segmentation of glands in colon
histology using local intensity and texture features. Here the dataset images
are cropped into patches with different window sizes and taken the intensity of
those patches, and also calculated texture-based features. Random forest
classifier has been used to classify this patch into different labels. A
multilevel random forest technique in a hierarchical way is proposed. This
solution is fast, accurate and it is very much applicable in a clinical setup
MODELOS DE PREDICCIÓN DE DESERCIÓN DE CLIENTES PARA UNA ADMINISTRADORA DE FONDOS ECUATORIANA
The existence of a company is justified by its customers, who are active as the most important assets. Faced with more competitive markets and where the needs of customers are increasingly demanding, companies seek efficiency in the use and analysis of data. Losing customers is more expensive than attracting new customers. The study on customer behavior, specifically attrition, has become a prevailing need within the business environment. In the presentation of research, data mining techniques are used to build models of customer attrition prediction, which can be applied within the financial disintermediation market. The statistical models used are: Decision Trees, Random Forests and Logistic Regression, these are evaluated in terms of accuracy by the area below the receiver operating characteristics curve (ROC). The evaluation of the results, the evaluation that the random forest has a better performance than the other models applied in the study.La existencia de una empresa está justificada por sus clientes, quienes son considerados como los activos más importantes. Ante mercados más competitivos y donde las necesidades de los clientes son cada vez más exigentes, las empresas buscan eficiencia en el uso y el análisis de datos. Perder clientes es más costoso que atraer nuevos clientes. El estudio sobre el comportamiento del cliente, particularmente su deserción, se ha convertido en una necesidad imperante dentro del ámbito empresarial. En la presente investigación se emplean técnicas de minería de datos para construir modelos de predicción de deserción de clientes, los cuales pueden ser aplicados dentro del mercado de desintermediación financiera. Los modelos estadísticos usados son: Árboles de decisión, bosques aleatorios y regresión logística, estos son evaluados en términos de precisión mediante área debajo de la curva de características de operación del receptor (AUC). La evaluación de los resultados, muestran que el bosque aleatorio tiene un mejor rendimiento que los otros modelos aplicados en el estudio
A Dynamic Classification Approach to Churn Prediction in Banking Industry
Churn prediction is the process of using transaction data to identify customers who are likely to cease their relationship with a company. To date, most work in churn prediction focuses on sampling strategies and supervised modeling over a short period of time. Few have explored the area of mining customer behavior pattern in longitudinal data. This research developed a dynamic approach to optimizing model specifications by using time-series predictors, multiple time periods, and rare event detection to enable accurate churn prediction. The study used a unique three-year dataset consisting of 32,000 transaction records of a retail bank in Florida, USA. It uses trend modeling to capture the change of customer behavior over time. Results show that data from multiple time periods helped to improve model precision and recall. This dynamic churn prediction approach can be generalized to other fields for which mining long term customer data is necessary
Early Churn Prediction from Large Scale User-Product Interaction Time Series
User churn, characterized by customers ending their relationship with a
business, has profound economic consequences across various
Business-to-Customer scenarios. For numerous system-to-user actions, such as
promotional discounts and retention campaigns, predicting potential churners
stands as a primary objective. In volatile sectors like fantasy sports,
unpredictable factors such as international sports events can influence even
regular spending habits. Consequently, while transaction history and
user-product interaction are valuable in predicting churn, they demand deep
domain knowledge and intricate feature engineering. Additionally, feature
development for churn prediction systems can be resource-intensive,
particularly in production settings serving 200m+ users, where inference
pipelines largely focus on feature engineering. This paper conducts an
exhaustive study on predicting user churn using historical data. We aim to
create a model forecasting customer churn likelihood, facilitating businesses
in comprehending attrition trends and formulating effective retention plans.
Our approach treats churn prediction as multivariate time series
classification, demonstrating that combining user activity and deep neural
networks yields remarkable results for churn prediction in complex
business-to-customer contexts.Comment: 12 pages, 3 tables, 8 figures, Accepted in ICML
Supervised and unsupervised data mining approaches in loan default prediction
Given the paramount importance of data mining in organizations and the possible contribution of a data-driven customer classification recommender systems for loan-extending financial institutions, the study applied supervised and supervised data mining approaches to derive the best classifier of loan default. A total of 900 instances with determined attributes and class labels were used for the training and cross-validation processes while prediction used 100 new instances without class labels. In the training phase, J48 with confidence factor of 50% attained the highest classification accuracy (76.85%), k-nearest neighbors (k-NN) 3 the highest (78.38%) in IBk variants, naïve Bayes has a classification accuracy of 76.65%, and logistic has 77.31% classification accuracy. k-NN 3 and logistic have the highest classification accuracy, F-measures, and kappa statistics. Implementation of these algorithms to the test set yielded 48 non-defaulters and 52 defaulters for k -NN 3 while 44 non-defaulters and 56 defaulters under logistic. Implications were discussed in the paper
Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner
In the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4% f1-score and 60.62% recall
A Hybrid Data Mining Method for Customer Churn Prediction
The expenses for attracting new customers are much higher compared to the ones needed to maintain old customers due to the increasing competition and business saturation. So customer retention is one of the leading factors in companies’ marketing. Customer retention requires a churn management, and an effective management requires an exact and effective model for churn prediction. A variety of techniques and methodologies have been used for churn prediction, such as logistic regression, neural networks, genetic algorithm, decision tree etc.. In this article, a hybrid method is presented that predicts customers churn more accurately, using data fusion and feature extraction techniques. After data preparation and feature selection, two algorithms, LOLIMOT and C5.0, were trained with different size of features and performed on test data. Then the outputs of the individual classifiers were combined with weighted voting. The results of applying this method on real data of a telecommunication company proved the effectiveness of the method
Enhanced feature mining and classifier models to predict customer churn for an e-retailer
Customer Churn, an event indicating a customer
abandoning an established relation with a business is an important
problem researched well both in academic and commercial
interest. Through this work, we propose an improved prediction
model that emphasizes on an effective data collection pipeline
through varied channels capturing explicit and implicit customer
footprints. Our goal is to demonstrate how Feature selection
algorithms can improve classifier efficiency. We also rank prominent
features which play a vital role in customer churn. Our
contributions through this paper can be broadly categorized
into 3 folds: First, we show how popular data mining tools in
Hadoop stack help extract several implicit customer interaction
metrics including Sales and Clickstream logs generated as a result
of customer interaction. Second, through Feature Engineering
techniques we verify that some of the new features we propose
have a definite impact on customer churn. Finally, we establish
how Regularized Logistic Regression, SVM and Gradient Boost
Random Forests are the best performing models for predicting
customer churn verified through comprehensive cross-validation
techniques
Customer Churn Prediction
Churned customers identification plays an essential role for the functioning and growth of any business. Identification of churned customers can help the business to know the reasons for the churn and they can plan their market strategies accordingly to enhance the growth of a business. This research is aimed at developing a machine learning model that can precisely predict the churned customers from the total customers of a Credit Union financial institution. A quantitative and deductive research strategies are employed to build a supervised machine learning model that addresses the class imbalance problem handled feature selection and efficiently predict the customer churn. The overall accuracy of the model, Receiver Operating Characteristic curve and Area Under the Receiver Operating Characteristic Curve is used as the evaluation metrics for this research to identify the best classifier. A comparative study on the most popular supervised machine learning methods – Logistic Regression, Random Forest, Support Vector Machine (SVM) and Neural Network were applied to customer churning prediction in a CU context. In the first phase of our experiments, the various feature selection techniques were studied. In the second phase of our study, all models were applied on the imbalance dataset and results were evaluated. SMOTE technique is used to balance the data and then the same models were applied on the balanced dataset and results were evaluated and compared. The best over-all classifier was Random Forest with accuracy almost 97%, precision 91% and recall as 98%