Search CORE

3,952 research outputs found

Applications of Clustering with Mixed Type Data in Life Insurance

Author: Gan Guojun
Vadiveloo Jeyaraj
Valdez Emiliano A.
Yin Shuang
Publication venue
Publication date: 26/01/2021
Field of study

Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of developing a monitoring system crucial for risk management. We extend the

k

-prototypes clustering algorithm to draw inference from a life insurance dataset using only the insured's characteristics and policy information without regard to known mortality. This clustering has the feature to efficiently handle categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contains observations, during 2014, of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having a lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders' attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions.Comment: 25 pages, 6 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

DYNAMIC THRESHOLDING GA-BASED ECG FEATURE SELECTION IN CARDIOVASCULAR DISEASE DIAGNOSIS

Author: Ben Azzouna Nadia
F. Hashim Hasanain
JEMEL Meriam
Publication venue: University of Information and Technology Communications
Publication date: 30/12/2023
Field of study

Electrocardiogram (ECG) data are usually used to diagnose cardiovascular disease (CVD) with the help of a revolutionary algorithm. Feature selection is a crucial step in the development of accurate and reliable diagnostic models for CVDs. This research introduces the dynamic threshold genetic algorithm (DTGA) algorithm, a type of genetic algorithm that is used for optimization problems and discusses its use in the context of feature selection. This research reveals the success of DTGA in selecting relevant ECG features that ultimately enhance accuracy and efficiency in the diagnosis of CVD. This work also proves the benefits of employing DTGA in clinical practice, including a reduction in the amount of time spent diagnosing patients and an increase in the precision with which individuals who are at risk of CVD can be identified

Iraqi Journal for Computers and Informatics

Security aware information classification in health care big data

Author: Funde Snehalata K.
Swain Gandharba
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2021
Field of study

These days e-medical services frameworks are getting famous for taking care of patients from far-off spots, so a lot of medical services information like the patient’s name, area, contact number, states of being are gathered distantly to treat the patients. A lot of information gathered from the different assets is named big data. The enormous sensitive information about the patient contains delicate data like systolic BP, pulse, temperature, the current state of being, and contact number of patients that should be recognized and sorted appropriately to shield it from abuse. This article presents a weightbased similarity (WBS) strategy to characterize the enormous information of health care data into two classifications like sensitive information and normal information. In the proposed method, the training dataset is utilized to sort information and it comprises of three fundamental advances like information extraction, mapping of information with the assistance of the training dataset, evaluation of the weight of input data with the threshold value to classify the data. The proposed strategy produces better outcomes with various assessment boundaries like precision, recall, F1 score, and accuracy value 92% to categorize the big data. Weka tool is utilized for examination among WBS and different existing order procedures

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

An overview of clustering methods with guidelines for application in mental health research

Author: Gao Caroline X.
Publication venue: Universidad de Granada
Publication date: 27/05/2023
Field of study

Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

Repositorio Institucional Universidad de Granada

Coping with new Challenges in Clustering and Biomedical Imaging

Author: Oswald Annahita
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 21/07/2011
Field of study

The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

Digitale Hochschulschriften der LMU