21 research outputs found

    Machine learning approach for segmenting glands in colon histology images using local intensity and texture features

    Full text link
    Colon Cancer is one of the most common types of cancer. The treatment is planned to depend on the grade or stage of cancer. One of the preconditions for grading of colon cancer is to segment the glandular structures of tissues. Manual segmentation method is very time-consuming, and it leads to life risk for the patients. The principal objective of this project is to assist the pathologist to accurate detection of colon cancer. In this paper, the authors have proposed an algorithm for an automatic segmentation of glands in colon histology using local intensity and texture features. Here the dataset images are cropped into patches with different window sizes and taken the intensity of those patches, and also calculated texture-based features. Random forest classifier has been used to classify this patch into different labels. A multilevel random forest technique in a hierarchical way is proposed. This solution is fast, accurate and it is very much applicable in a clinical setup

    MODELOS DE PREDICCIÓN DE DESERCIÓN DE CLIENTES PARA UNA ADMINISTRADORA DE FONDOS ECUATORIANA

    Get PDF
    The existence of a company is justified by its customers, who are active as the most important assets. Faced with more competitive markets and where the needs of customers are increasingly demanding, companies seek efficiency in the use and analysis of data. Losing customers is more expensive than attracting new customers. The study on customer behavior, specifically attrition, has become a prevailing need within the business environment. In the presentation of research, data mining techniques are used to build models of customer attrition prediction, which can be applied within the financial disintermediation market. The statistical models used are: Decision Trees, Random Forests and Logistic Regression, these are evaluated in terms of accuracy by the area below the receiver operating characteristics curve (ROC). The evaluation of the results, the evaluation that the random forest has a better performance than the other models applied in the study.La existencia de una empresa está justificada por sus clientes, quienes son considerados como los activos más importantes. Ante mercados más competitivos y donde las necesidades de los clientes son cada vez más exigentes, las empresas buscan eficiencia en el uso y el análisis de datos. Perder clientes es más costoso que atraer nuevos clientes. El estudio sobre el comportamiento del cliente, particularmente su deserción, se ha convertido en una necesidad imperante dentro del ámbito empresarial. En la presente investigación se emplean técnicas de minería de datos para construir modelos de predicción de deserción de clientes, los cuales pueden ser aplicados dentro del mercado de desintermediación financiera. Los modelos estadísticos usados son: Árboles de decisión, bosques aleatorios y regresión logística, estos son evaluados en términos de precisión mediante área debajo de la curva de características de operación del receptor (AUC). La evaluación de los resultados, muestran que el bosque aleatorio tiene un mejor rendimiento que los otros modelos aplicados en el estudio

    A Dynamic Classification Approach to Churn Prediction in Banking Industry

    Get PDF
    Churn prediction is the process of using transaction data to identify customers who are likely to cease their relationship with a company. To date, most work in churn prediction focuses on sampling strategies and supervised modeling over a short period of time. Few have explored the area of mining customer behavior pattern in longitudinal data. This research developed a dynamic approach to optimizing model specifications by using time-series predictors, multiple time periods, and rare event detection to enable accurate churn prediction. The study used a unique three-year dataset consisting of 32,000 transaction records of a retail bank in Florida, USA. It uses trend modeling to capture the change of customer behavior over time. Results show that data from multiple time periods helped to improve model precision and recall. This dynamic churn prediction approach can be generalized to other fields for which mining long term customer data is necessary

    Early Churn Prediction from Large Scale User-Product Interaction Time Series

    Full text link
    User churn, characterized by customers ending their relationship with a business, has profound economic consequences across various Business-to-Customer scenarios. For numerous system-to-user actions, such as promotional discounts and retention campaigns, predicting potential churners stands as a primary objective. In volatile sectors like fantasy sports, unpredictable factors such as international sports events can influence even regular spending habits. Consequently, while transaction history and user-product interaction are valuable in predicting churn, they demand deep domain knowledge and intricate feature engineering. Additionally, feature development for churn prediction systems can be resource-intensive, particularly in production settings serving 200m+ users, where inference pipelines largely focus on feature engineering. This paper conducts an exhaustive study on predicting user churn using historical data. We aim to create a model forecasting customer churn likelihood, facilitating businesses in comprehending attrition trends and formulating effective retention plans. Our approach treats churn prediction as multivariate time series classification, demonstrating that combining user activity and deep neural networks yields remarkable results for churn prediction in complex business-to-customer contexts.Comment: 12 pages, 3 tables, 8 figures, Accepted in ICML

    Supervised and unsupervised data mining approaches in loan default prediction

    Get PDF
    Given the paramount importance of data mining in organizations and the possible contribution of a data-driven customer classification recommender systems for loan-extending financial institutions, the study applied supervised and supervised data mining approaches to derive the best classifier of loan default. A total of 900 instances with determined attributes and class labels were used for the training and cross-validation processes while prediction used 100 new instances without class labels. In the training phase, J48 with confidence factor of 50% attained the highest classification accuracy (76.85%), k-nearest neighbors (k-NN) 3 the highest (78.38%) in IBk variants, naïve Bayes has a classification accuracy of 76.65%, and logistic has 77.31% classification accuracy. k-NN 3 and logistic have the highest classification accuracy, F-measures, and kappa statistics. Implementation of these algorithms to the test set yielded 48 non-defaulters and 52 defaulters for k -NN 3 while 44 non-defaulters and 56 defaulters under logistic. Implications were discussed in the paper

    Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner

    Get PDF
    In the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4% f1-score and 60.62% recall

    A Hybrid Data Mining Method for Customer Churn Prediction

    Get PDF
    The expenses for attracting new customers are much higher compared to the ones needed to maintain old customers due to the increasing competition and business saturation. So customer retention is one of the leading factors in companies’ marketing. Customer retention requires a churn management, and an effective management requires an exact and effective model for churn prediction. A variety of techniques and methodologies have been used for churn prediction, such as logistic regression, neural networks, genetic algorithm, decision tree etc.. In this article, a hybrid method is presented that predicts customers churn more accurately, using data fusion and feature extraction techniques. After data preparation and feature selection, two algorithms, LOLIMOT and C5.0, were trained with different size of features and performed on test data. Then the outputs of the individual classifiers were combined with weighted voting. The results of applying this method on real data of a telecommunication company proved the effectiveness of the method

    Enhanced feature mining and classifier models to predict customer churn for an e-retailer

    Get PDF
    Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well both in academic and commercial interest. Through this work, we propose an improved prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. Our goal is to demonstrate how Feature selection algorithms can improve classifier efficiency. We also rank prominent features which play a vital role in customer churn. Our contributions through this paper can be broadly categorized into 3 folds: First, we show how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we establish how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques

    Customer Churn Prediction

    Get PDF
    Churned customers identification plays an essential role for the functioning and growth of any business. Identification of churned customers can help the business to know the reasons for the churn and they can plan their market strategies accordingly to enhance the growth of a business. This research is aimed at developing a machine learning model that can precisely predict the churned customers from the total customers of a Credit Union financial institution. A quantitative and deductive research strategies are employed to build a supervised machine learning model that addresses the class imbalance problem handled feature selection and efficiently predict the customer churn. The overall accuracy of the model, Receiver Operating Characteristic curve and Area Under the Receiver Operating Characteristic Curve is used as the evaluation metrics for this research to identify the best classifier. A comparative study on the most popular supervised machine learning methods – Logistic Regression, Random Forest, Support Vector Machine (SVM) and Neural Network were applied to customer churning prediction in a CU context. In the first phase of our experiments, the various feature selection techniques were studied. In the second phase of our study, all models were applied on the imbalance dataset and results were evaluated. SMOTE technique is used to balance the data and then the same models were applied on the balanced dataset and results were evaluated and compared. The best over-all classifier was Random Forest with accuracy almost 97%, precision 91% and recall as 98%
    corecore