5,777 research outputs found

    A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

    Full text link
    Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

    Auto Insurance Business Analytics Approach for Customer Segmentation Using Multiple Mixed-Type Data Clustering Algorithms

    Get PDF
    Customer segmentation is critical for auto insurance companies to gain competitive advantage by mining useful customer related information. While some efforts have been made for customer segmentation to support auto insurance decision making, their customer segmentation results tend to be affected by the characteristics of the algorithm used and lack multiple validation from multiple algorithms. To this end, we propose an auto insurance business analytics approach that segments customers by using three mixed-type data clustering algorithms including k-prototypes, improved k-prototypes and similarity-based agglomerative clustering. The customer segmentation results of these algorithms can complement and reinforce each other and demonstrate as much information as possible to support decision-making. To confirm its practical value, the proposed approach extracts seven rules for an auto insurance company that may support the company to make customer related decisions and develop insurance products

    Clustering Mixed Numeric and Categorical Data with Cuckoo Search

    Get PDF

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Applications of Clustering with Mixed Type Data in Life Insurance

    Full text link
    Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of developing a monitoring system crucial for risk management. We extend the kk-prototypes clustering algorithm to draw inference from a life insurance dataset using only the insured's characteristics and policy information without regard to known mortality. This clustering has the feature to efficiently handle categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contains observations, during 2014, of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having a lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders' attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions.Comment: 25 pages, 6 figures, 5 table

    An Extended RFM Model for Customer Behaviour and Demographic Analysis in Retail Industry

    Get PDF
    Background: Customer segmentation has become one of the most innovative ways which help businesses adopt appropriate marketing campaigns and reach targeted customers. The RFM model and machine learning combination have been widely applied in various areas. Motivations: With the rapid increase of transactional data, the RFM model can accurately segment customers and provide deeper insights into customers’ purchasing behaviour. However, the traditional RFM model is limited to 3 variables, Recency, Frequency and Monetary, without revealing segments based on demographic features. Meanwhile, the contribution of demographic characteristics to marketing strategies is extremely important. Methods/Approach: The article proposed an extended RFMD model (D-Demographic) with a combination of behavioural and demographic variables. Customer segmentation can be performed effectively using the RFMD model, K-Means, and K-Prototype algorithms. Results: The extended model is applied to the retail dataset, and the experimental result shows 5 clusters with different features. The effectiveness of the new model is measured by the Adjusted Rand Index and Adjusted Mutual Information. Furthermore, we use Cohort analysis to analyse customer retention rates and recommend marketing strategies for each segment. Conclusions: According to the evaluation, the proposed RMFD model was deployed with stable results created by two clustering algorithms. Businesses can apply this model to deeply understand customer behaviour with their demographics and launch efficient campaigns
    corecore