Search CORE

21 research outputs found

Machine learning approach for segmenting glands in colon histology images using local intensity and texture features

Author: Chatterjee Soumick
Khatun Rupali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/05/2019
Field of study

Colon Cancer is one of the most common types of cancer. The treatment is planned to depend on the grade or stage of cancer. One of the preconditions for grading of colon cancer is to segment the glandular structures of tissues. Manual segmentation method is very time-consuming, and it leads to life risk for the patients. The principal objective of this project is to assist the pathologist to accurate detection of colon cancer. In this paper, the authors have proposed an algorithm for an automatic segmentation of glands in colon histology using local intensity and texture features. Here the dataset images are cropped into patches with different window sizes and taken the intensity of those patches, and also calculated texture-based features. Random forest classifier has been used to classify this patch into different labels. A multilevel random forest technique in a hierarchical way is proposed. This solution is fast, accurate and it is very much applicable in a clinical setup

arXiv.org e-Print Archive

Crossref

MODELOS DE PREDICCIÓN DE DESERCIÓN DE CLIENTES PARA UNA ADMINISTRADORA DE FONDOS ECUATORIANA

Author: Bohórquez María
Paredes Milton
Torys Joyce
Publication venue: 'Escuela Superior Politecnica del Litoral'
Publication date: 01/01/2020
Field of study

The existence of a company is justified by its customers, who are active as the most important assets. Faced with more competitive markets and where the needs of customers are increasingly demanding, companies seek efficiency in the use and analysis of data. Losing customers is more expensive than attracting new customers. The study on customer behavior, specifically attrition, has become a prevailing need within the business environment. In the presentation of research, data mining techniques are used to build models of customer attrition prediction, which can be applied within the financial disintermediation market. The statistical models used are: Decision Trees, Random Forests and Logistic Regression, these are evaluated in terms of accuracy by the area below the receiver operating characteristics curve (ROC). The evaluation of the results, the evaluation that the random forest has a better performance than the other models applied in the study.La existencia de una empresa está justificada por sus clientes, quienes son considerados como los activos más importantes. Ante mercados más competitivos y donde las necesidades de los clientes son cada vez más exigentes, las empresas buscan eficiencia en el uso y el análisis de datos. Perder clientes es más costoso que atraer nuevos clientes. El estudio sobre el comportamiento del cliente, particularmente su deserción, se ha convertido en una necesidad imperante dentro del ámbito empresarial. En la presente investigación se emplean técnicas de minería de datos para construir modelos de predicción de deserción de clientes, los cuales pueden ser aplicados dentro del mercado de desintermediación financiera. Los modelos estadísticos usados son: Árboles de decisión, bosques aleatorios y regresión logística, estos son evaluados en términos de precisión mediante área debajo de la curva de características de operación del receptor (AUC). La evaluación de los resultados, muestran que el bosque aleatorio tiene un mejor rendimiento que los otros modelos aplicados en el estudio

Escuela Superior Politécnica del Litoral (ESPOL): Open Journal Systems

DIALNET

A Dynamic Classification Approach to Churn Prediction in Banking Industry

Author: Chung Wingyan
Leung Hoiyin Christina
Publication venue: AIS Electronic Library (AISeL)
Publication date: 10/08/2020
Field of study

Churn prediction is the process of using transaction data to identify customers who are likely to cease their relationship with a company. To date, most work in churn prediction focuses on sampling strategies and supervised modeling over a short period of time. Few have explored the area of mining customer behavior pattern in longitudinal data. This research developed a dynamic approach to optimizing model specifications by using time-series predictors, multiple time periods, and rare event detection to enable accurate churn prediction. The study used a unique three-year dataset consisting of 32,000 transaction records of a retail bank in Florida, USA. It uses trend modeling to capture the change of customer behavior over time. Results show that data from multiple time periods helped to improve model precision and recall. This dynamic churn prediction approach can be generalized to other fields for which mining long term customer data is necessary

AIS Electronic Library (AISeL)

Developing a prediction model for customer churn from electronic banking services using data mining

Author: Abbas Keramati
Hajar Ghaneei
Seyed Mohammad Mirmohammadi
Publication venue: Springer Nature
Publication date
Field of study

Springer - Publisher Connector

Early Churn Prediction from Large Scale User-Product Interaction Time Series

Author: Bhattacharjee Shamik
Patil Nilesh
Thukral Utkarsh
Publication venue
Publication date: 25/09/2023
Field of study

User churn, characterized by customers ending their relationship with a business, has profound economic consequences across various Business-to-Customer scenarios. For numerous system-to-user actions, such as promotional discounts and retention campaigns, predicting potential churners stands as a primary objective. In volatile sectors like fantasy sports, unpredictable factors such as international sports events can influence even regular spending habits. Consequently, while transaction history and user-product interaction are valuable in predicting churn, they demand deep domain knowledge and intricate feature engineering. Additionally, feature development for churn prediction systems can be resource-intensive, particularly in production settings serving 200m+ users, where inference pipelines largely focus on feature engineering. This paper conducts an exhaustive study on predicting user churn using historical data. We aim to create a model forecasting customer churn likelihood, facilitating businesses in comprehending attrition trends and formulating effective retention plans. Our approach treats churn prediction as multivariate time series classification, demonstrating that combining user activity and deep neural networks yields remarkable results for churn prediction in complex business-to-customer contexts.Comment: 12 pages, 3 tables, 8 figures, Accepted in ICML

arXiv.org e-Print Archive

Supervised and unsupervised data mining approaches in loan default prediction

Author: Alejandrino Jovanne C.
Murcia John Vianne Bauya
P. Bolacoy Jovito Jr.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2023
Field of study

Given the paramount importance of data mining in organizations and the possible contribution of a data-driven customer classification recommender systems for loan-extending financial institutions, the study applied supervised and supervised data mining approaches to derive the best classifier of loan default. A total of 900 instances with determined attributes and class labels were used for the training and cross-validation processes while prediction used 100 new instances without class labels. In the training phase, J48 with confidence factor of 50% attained the highest classification accuracy (76.85%), k-nearest neighbors (k-NN) 3 the highest (78.38%) in IBk variants, naïve Bayes has a classification accuracy of 76.65%, and logistic has 77.31% classification accuracy. k-NN 3 and logistic have the highest classification accuracy, F-measures, and kappa statistics. Implementation of these algorithms to the test set yielded 48 non-defaulters and 52 defaulters for k -NN 3 while 44 non-defaulters and 56 defaulters under logistic. Implications were discussed in the paper

ZENODO

Institute of Advanced Engineering and Science

Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner

Author: Hiew Fu San
Khoh Wee How
Ooi Shih Yin
Pang Ying Han
Tan Yan Lin
Publication venue: MMU Press
Publication date: 01/09/2023
Field of study

In the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4% f1-score and 60.62% recall

Directory of Open Access Journals

A Hybrid Data Mining Method for Customer Churn Prediction

Author: E. Jamalian
R. Foukerdi
Publication venue: D. G. Pylarinos
Publication date: 01/06/2018
Field of study

The expenses for attracting new customers are much higher compared to the ones needed to maintain old customers due to the increasing competition and business saturation. So customer retention is one of the leading factors in companies’ marketing. Customer retention requires a churn management, and an effective management requires an exact and effective model for churn prediction. A variety of techniques and methodologies have been used for churn prediction, such as logistic regression, neural networks, genetic algorithm, decision tree etc.. In this article, a hybrid method is presented that predicts customers churn more accurately, using data fusion and feature extraction techniques. After data preparation and feature selection, two algorithms, LOLIMOT and C5.0, were trained with different size of features and performed on test data. Then the outputs of the individual classifiers were combined with weighted voting. The results of applying this method on real data of a telecommunication company proved the effectiveness of the method

Directory of Open Access Journals

Enhanced feature mining and classifier models to predict customer churn for an e-retailer

Author: Subramanya Karthik B.
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well both in academic and commercial interest. Through this work, we propose an improved prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. Our goal is to demonstrate how Feature selection algorithms can improve classifier efficiency. We also rank prominent features which play a vital role in customer churn. Our contributions through this paper can be broadly categorized into 3 folds: First, we show how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we establish how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques

Digital Repository @ Iowa State University (ISU)

Customer Churn Prediction

Author: Wadikar Deepshikha
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

Churned customers identification plays an essential role for the functioning and growth of any business. Identification of churned customers can help the business to know the reasons for the churn and they can plan their market strategies accordingly to enhance the growth of a business. This research is aimed at developing a machine learning model that can precisely predict the churned customers from the total customers of a Credit Union financial institution. A quantitative and deductive research strategies are employed to build a supervised machine learning model that addresses the class imbalance problem handled feature selection and efficiently predict the customer churn. The overall accuracy of the model, Receiver Operating Characteristic curve and Area Under the Receiver Operating Characteristic Curve is used as the evaluation metrics for this research to identify the best classifier. A comparative study on the most popular supervised machine learning methods – Logistic Regression, Random Forest, Support Vector Machine (SVM) and Neural Network were applied to customer churning prediction in a CU context. In the first phase of our experiments, the various feature selection techniques were studied. In the second phase of our study, all models were applied on the imbalance dataset and results were evaluated. SMOTE technique is used to balance the data and then the same models were applied on the balanced dataset and results were evaluated and compared. The best over-all classifier was Random Forest with accuracy almost 97%, precision 91% and recall as 98%

Arrow@TUDublin