Search CORE

20 research outputs found

Unsupervised machine learning of integrated health and social care data from the Macmillan Improving the Cancer Journey service in Glasgow

Author: Brewster D
Butcher H
Catto J
Cross W
Donnelly D
Downing A
Gavin A
Glaser A
Hounsome L
Huws D
Kind P
Selby P
Wagland R
Watson E
Wilding S
Wright P
Publication venue
Publication date: 01/01/2018
Field of study

Background: Improving the Cancer Journey (ICJ) was launched in 2014 by Glasgow City Council and Macmillan Cancer Support. As part of routine service, data is collected on ICJ users including demographic and health information, results from holistic needs assessments and quality of life scores as measured by EQ-5D health status. There is also data on the number and type of referrals made and feedback from users on the overall service. By applying artificial intelligence and interactive visualization technologies to this data, we seek to improve service provision and optimize resource allocation.Method: An unsupervised machine-learning algorithm was deployed to cluster the data. The classical k-means algorithm was extended with the k-modes technique for categorical data, and the gap heuristic automatically identified the number of clusters. The resulting clusters are used to summarize complex data sets and produce three-dimensional visualizations of the data landscape. Furthermore, the traits of new ICJ clients are predicted by approximately matching their details to the nearest existing cluster center.Results: Cross-validation showed the model’s effectiveness over a wide range of traits. For example, the model can predict marital status, employment status and housing type with an accuracy between 2.4 to 4.8 times greater than random selection. One of the most interesting preliminary findings is that area deprivation (measured through Scottish Index of Multiple Deprivation-SIMD) is a better predictor of an ICJ client’s needs than primary diagnosis (cancer type).Conclusion: A key strength of this system is its ability to rapidly ingest new data on its own and derive new predictions from those data. This means the model can guide service provision by forecasting demand based on actual or hypothesized data. The aim is to provide intelligent person-centered recommendations. The machine-learning model described here is part of a prototype software tool currently under development for use by the cancer support community.Disclosure: Funded by Macmillan Cancer Support</p

University of Lincoln Institutional Repository

Abertay Research Portal

Glasgow School of Art: RADAR

Sheffield Hallam University Research Archive

Spiral - Imperial College Digital Repository

Enlighten

White Rose Research Online

Wolverhampton Intellectual Repository and E-theses

espace@Curtin

UCL Discovery

Teeside University's Research Repository

Oxford University Research Archive

An Enhanced Initialization Method to Find an Initial Center for K-modes Clustering

Author: S. Saranya, Dr.P.Jayanthi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2017
Field of study

Data mining is a technique which extracts the information from the large amount of data. To group the objects having similar characteristics, clustering method is used. K-means clustering algorithm is very efficient for large data sets deals with numerical quantities however it not works well for real world data sets which contain categorical values for most of the attributes. K-modes algorithm is used in the place of K-means algorithm. In the existing system, the initialization of K- modes clustering from the view of outlier detection is considered. It avoids that various initial cluster centers come from the same cluster. To overcome the above said limitation, it uses Initial_Distance and Initial_Entropy algorithms which use a new weightage formula to calculate the degree of outlierness of each object. K-modes algorithm can guarantee that the chosen initial cluster centers are not outliers. To improve the performance further, a new modified distance metric -weighted matching distance is used to calculate the distance between two objects during the process of initialization. As well as, one of the data pre-processing methods is used to improve the quality of data. Experiments are carried out on several data sets from UCI repository and the results demonstrated the effectiveness of the initialization method in the proposed algorithm

International Journal on Recent and Innovation Trends in Computing and Communication

Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Author: Alguwaizani
Bai
Bai
Bai
Barbara
Bradley
Cao
Cao
Cao
Changhao Huang
Chen
Chen
Franceschi
Frossyniotis
Gan
Ganti
Gilpin
Guha
Gupta
Hansen
Hansen
Hansen
He
Helber
Huang
Ikou Kaku
Jain
Jiang
Jiaoying Huang
Kao
Kaufman
Khan
Khan
Kim
MacQueen
Mladenovic
Mladenović
Mueller
Myhre
Ng
Parmar
Qin
Ralambondrainy
Saha
Sun
Wu
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Xiao
Yiyong Xiao
Yuchun Xu
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results

Crossref

Aston Publications Explorer

A fair-multicluster approach to clustering of categorical data

Author: Heras Martínez Antonio José
Santos Mangudo Carlos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2022
Field of study

In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a tradeoff between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters

Docta Complutense

Congruence between latent class and k-modes analyses in the identification of oncology patients with distinct symptom experiences

Author: Apostolidis Kathi
Armes Jo
Barnaghi Payam
Conley Yvette P.
Cooper Bruce A.
Hammer Marilyn
Hu Xiao
Katsaragakis Stylianos
Kober Kord M.
Levine Jon D.
Maguire Roma
McCann Lisa
Miaskowski Christine
Papachristou Nikoloas
Patiraki Elisabeth
Paul Steven M.
Ream Emma
Wright Fay
Publication venue
Publication date: 28/08/2017
Field of study

CONTEXT: Risk profiling of oncology patients based on their symptom experience assists clinicians to provide more personalized symptom management interventions. Recent findings suggest that oncology patients with distinct symptom profiles can be identified using a variety of analytic methods. OBJECTIVES: The objective of this study was to evaluate the concordance between the number and types of subgroups of patients with distinct symptom profiles using latent class analysis and K-modes analysis. METHODS: Using data on the occurrence of 25 symptoms from the Memorial Symptom Assessment Scale, that 1329 patients completed prior to their next dose of chemotherapy (CTX), Cohen's kappa coefficient was used to evaluate for concordance between the two analytic methods. For both latent class analysis and K-modes, differences among the subgroups in demographic, clinical, and symptom characteristics, as well as quality of life outcomes were determined using parametric and nonparametric statistics. RESULTS: Using both analytic methods, four subgroups of patients with distinct symptom profiles were identified (i.e., all low, moderate physical and lower psychological, moderate physical and higher Psychological, and all high). The percent agreement between the two methods was 75.32%, which suggests a moderate level of agreement. In both analyses, patients in the all high group were significantly younger and had a higher comorbidity profile, worse Memorial Symptom Assessment Scale subscale scores, and poorer QOL outcomes. CONCLUSION: Both analytic methods can be used to identify subgroups of oncology patients with distinct symptom profiles. Additional research is needed to determine which analytic methods and which dimension of the symptom experience provide the most sensitive and specific risk profile

Crossref

University of Strathclyde Institutional Repository

ZENODO

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

King's Research Portal

Surrey Research Insight