119 research outputs found

    Enhancing Unbalanced Data Classification with Cross-Validation and Extreme Gradient Boosting: A Comprehensive Analysis

    Get PDF
    As a novel and efficient ensemble learning algorithm, XGBoost has been widely applied due to its multiple advantages, but its classification effect in cases of data imbalance is often not ideal. Aiming at this problem, efforts were made to optimize XGBoost and the Cross Validation algorithm. The main idea is to combine cross validation and XGBoost on unbalanced data for data processing, and then get the final model based on XGBoost through training. At the same time, optimal parameters are searched and adjusted automatically through optimization algorithms to realize more accurate classification predictions. In the testing phase, the area under the curve (AUC) is used as an evaluation indicator to compare and analyze the classification performance of various sampling methods and algorithm models. The results of the model analysis using AUC are expected to verify the feasibility and effectiveness of the proposed algorithm

    Prediction of Customers Churn in Telecommunication Industry

    Get PDF
    In the developed world, mobile markets have reached saturation on subscriber penetration and connections growth. The challenge for operators has evolved from attracting new customers to retaining existing ones. Various components have an impact on churn. Therefore, it is very important to understand the behaviour of the customers, encourage them in spending more and then predicting the future by preventing their attrition. As the industry is evolving, the biggest challenge for operators is to engage with consumers and retain their loyalty by delivering more competitive and innovative value-added services. While understanding consumer needs remains essential to improve customer retention, other emerging tariffs and services are likely to carry a long-term impact on churn (including national, international and roaming bundles tariffs and mobile services). The churn might be voluntary in cases they want to leave the network they actually are using, or involuntary churn in case of unpaid bills. The methodology used to do the right evaluations in order to achieve strong results in this field is very large and varied. The scope of this thesis is to identify and analyse different appropriate models that can help the data analysts to find the churners in Telecommunication industry. In this thesis we are going to discuss on two important topics in telecommunication markets and their respective predictive models, which tend to understand the customer behaviour towards different competitors: market share in telecommunication industry and customer churn

    Churn Identification and Prediction from a Large-Scale Telecommunication Dataset Using NLP

    Get PDF
    The identification of customer churn is a major issue for large telecom businesses. In order to manage the data of current customers as well as acquire and manage new customers, every day, a substantial volume of data gets generated. Therefore, it's crucial to identify the causes of client churn so that the appropriate steps can be taken to lower it. Numerous researchers have already discussed their efforts to combine static and dynamic approaches in order to reduce churn in big data sets, but these systems still have many issues when it comes to actually identifying churn. In this paper, we suggested two methods, the first of which is churn identification and using Natural Language Processing (NLP) methods and machine learning techniques, we make predictions based on a vast telecommunication data set. The NLP process involves data pre-processing, normalization, feature extraction, and feature selection. For feature extraction, we employ unique techniques like TF-IDF, Stanford NLP, and occurrence correlation methods, have been suggested. Throughout the lesson, a machine learning classification algorithm is used for training and testing. Finally, the system employs a variety of cross validation techniques and training and evaluating Machine learning algorithms. The experimental analysis shows the system's efficacy and accuracy

    Customer churn prediction for web browsers

    Get PDF
    In the competitive web browser market, identifying potential churners is critical to decreasing the loss of existing customers. Churn prediction based on customer behaviors plays a vital role in customer retention strategies. However, traditional churn prediction algorithms such as Tree-based models cannot exploit the temporal characteristics of browser customers behaviors, while sequence models cannot explicitly extract the information between multiple behaviors. To meet this challenge, we propose a novel model named Multivariate Behavior Sequence Transformer (MBST) with two complementary attention mechanisms to explore the temporal and behavioral information separately. Furthermore, a Tree-based classifier is attached for churn prediction instead of using the multilayer perceptron. Extensive experiments on a real-world Tencent QQ browser dataset with over 600,000 samples demonstrate that the proposed MBST achieves the F-score of 82.72% and the Area Under Curve (AUC) of 93.75%, which significantly outperforms state-of-the-art methods in terms of churn prediction

    Toward a model of customer experience

    Get PDF
    Retaining high-value and profitable customers is a major strategic objective for many companies. In mature mobile phone markets where growth has slowed, the defection of customers from one network to another has intensified and is strongly fuelled by poor Customer Experience. Trends in the service economy suggest that experience can be exploited as a means of supplying the basis of a new economic offering, ignited in part by the shift that is taking place in the analysis of people’s interaction with digital products. In this light, the research describes a strategic approach to the use of Information Systems as a means of improving Customer Experience. Using Action Research in a mobile telecommunications operator, a Customer Experience Monitoring and Action Response model (CEMAR) is developed that evaluates disparate customer data, residing across many systems, builds experience profiles and suggests appropriate contextual actions where experience is poor. The model provides value in identifying issues, understanding them in the context of the overall Customer Experience (over time) and dealing with them appropriately. The novelty of the approach is the synthesis of data analysis with an enhanced understanding of Customer Experience which is developed implicitly, in real-time and in advance of any instigation by the customer.EThOS - Electronic Theses Online ServiceRoyal Academy of EngineeringGBUnited Kingdo

    Characterization of the clients retention in the tlecommunications companies

    Get PDF
    The ability of a company to be able to do a precisely churn prediction, so it can act on it, is paramount. For this reason, Deloitte addressed me the challenge of characterizing the client’s retention in the telecom companies. To do so, it was created a comprehensive tool that enables Deloitte to evaluate the churn management maturity level of a telecom operator and highlight its strengths and weaknesses. The development of this matrix was based on a depth churn research, a market research based on 40 interviews and 2 focus group and the valuable feedback from Deloitte consultants

    X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs

    Full text link
    Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption
    • …
    corecore