484 research outputs found

    Salud y minería de datos: análisis de conglomerados con el algoritmo PAM para la mejora de la gestión de turnos médicos en un centro de salud

    Get PDF
    PAM (Partitioning Around Medoids) es un algoritmo de Minería de Datos utilizado para el análisis de conglomerados, es decir, la identificación de grupos o clusters en un conjunto de datos. Aplicamos este algoritmo sobre la base de datos de turnos cancelados en una especialidad médica con alta demanda de atenciones, oftalmología, en un centro de salud público. En el análisis pudimos determinar la existencia de tres grupos de turnos cancelados, cuya descripción y caracterización nos posibilita mejorar la gestión de atenciones médicas programadas en esta especialidad.Sociedad Argentina de Informática e Investigación Operativ

    Clustering based feature selection using Partitioning Around Medoids (PAM)

    Get PDF
    High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%

    Partition clustering for GIS map data protection

    Get PDF

    A fast and recursive algorithm for clustering large datasets with kk-medians

    Get PDF
    Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the kk-means algorithm, a new class of recursive stochastic gradient algorithms designed for the kk-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which are known to have better performances, and a data-driven procedure that allows automatic selection of the value of the descent step is proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as kk-means, trimmed kk-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.Comment: Under revision for Computational Statistics and Data Analysi

    Comparison of machine learning clustering algorithms for detecting heterogeneity of treatment effect in acute respiratory distress syndrome: A secondary analysis of three randomised controlled trials

    Get PDF
    BACKGROUND: Heterogeneity in Acute Respiratory Distress Syndrome (ARDS), as a consequence of its non-specific definition, has led to a multitude of negative randomised controlled trials (RCTs). Investigators have sought to identify heterogeneity of treatment effect (HTE) in RCTs using clustering algorithms. We evaluated the proficiency of several commonly-used machine-learning algorithms to identify clusters where HTE may be detected. METHODS: Five unsupervised: Latent class analysis (LCA), K-means, partition around medoids, hierarchical, and spectral clustering; and four supervised algorithms: model-based recursive partitioning, Causal Forest (CF), and X-learner with Random Forest (XL-RF) and Bayesian Additive Regression Trees were individually applied to three prior ARDS RCTs. Clinical data and research protein biomarkers were used as partitioning variables, with the latter excluded for secondary analyses. For a clustering schema, HTE was evaluated based on the interaction term of treatment group and cluster with day-90 mortality as the dependent variable. FINDINGS: No single algorithm identified clusters with significant HTE in all three trials. LCA, XL-RF, and CF identified HTE most frequently (2/3 RCTs). Important partitioning variables in the unsupervised approaches were consistent across algorithms and RCTs. In supervised models, important partitioning variables varied between algorithms and across RCTs. In algorithms where clusters demonstrated HTE in the same trial, patients frequently interchanged clusters from treatment-benefit to treatment-harm clusters across algorithms. LCA aside, results from all other algorithms were subject to significant alteration in cluster composition and HTE with random seed change. Removing research biomarkers as partitioning variables greatly reduced the chances of detecting HTE across all algorithms. INTERPRETATION: Machine-learning algorithms were inconsistent in their abilities to identify clusters with significant HTE. Protein biomarkers were essential in identifying clusters with HTE. Investigations using machine-learning approaches to identify clusters to seek HTE require cautious interpretation. FUNDING: NIGMS R35 GM142992 (PS), NHLBI R35 HL140026 (CSC); NIGMS R01 GM123193, Department of Defense W81XWH-21-1-0009, NIA R21 AG068720, NIDA R01 DA051464 (MMC)

    Personalized Web-Based Cognitive Rehabilitation Treatments for Patients with Traumatic Brain Injury : Cluster Analysis

    Get PDF
    Funding: This study was partially funded by the INNOBRAIN project: New Technologies for Innovation in Cognitive Stimulation and Rehabilitation (COMRDI15-1-0017). ACCIÓ-Comunitat RIS3CAT d'innovació en salut NEXTHEALTH (COM15-1-0004) cofinanced this project under the FEDER Catalonia 2014-2020 Operational ProgramTraumatic brain injury (TBI) is a leading cause of disability worldwide. TBI is a highly heterogeneous disease, which makes it complex for effective therapeutic interventions. Cluster analysis has been extensively applied in previous research studies to identify homogeneous subgroups based on performance in neuropsychological baseline tests. Nevertheless, most analyzed samples are rarely larger than a size of 100, and different cluster analysis approaches and cluster validity indices have been scarcely compared or applied in web-based rehabilitation treatments. The aims of our study were as follows: (1) to apply state-of-the-art cluster validity indices to different cluster strategies: hierarchical, partitional, and model-based, (2) to apply combined strategies of dimensionality reduction by using principal component analysis and random forests and perform stability assessment of the final profiles, (3) to characterize the identified profiles by using demographic and clinically relevant variables, and (4) to study the external validity of the obtained clusters by considering 3 relevant aspects of TBI rehabilitation: Glasgow Coma Scale, functional independence measure, and execution of web-based cognitive tasks. This study was performed from August 2008 to July 2019. Different cluster strategies were executed with Mclust, factoextra, and cluster R packages. For combined strategies, we used the FactoMineR and random forest R packages. Stability analysis was performed with the fpc R package. Between-group comparisons for external validation were performed using 2-tailed t test, chi-square test, or Mann-Whitney U test, as appropriate. We analyzed 574 adult patients with TBI (mostly severe) who were undergoing web-based rehabilitation. We identified and characterized 3 clusters with strong internal validation: (1) moderate attentional impairment and moderate dysexecutive syndrome with mild memory impairment and normal spatiotemporal perception, with almost 66% (111/170) of the patients being highly educated (P <.05); (2) severe dysexecutive syndrome with severe attentional and memory impairments and normal spatiotemporal perception, with 49.2% (153/311) of the patients being highly educated (P <.05); (3) very severe cognitive impairment, with 45.2% (42/93) of the patients being highly educated (P <.05). We externally validated them with severity of injury (P =.006) and functional independence assessments: cognitive (P <.001), motor (P <.001), and total (P <.001). We mapped 151,763 web-based cognitive rehabilitation tasks during the whole period to the 3 obtained clusters (P <.001) and confirmed the identified patterns. Stability analysis indicated that clusters 1 and 2 were respectively rated as 0.60 and 0.75; therefore, they were measuring a pattern and cluster 3 was rated as highly stable. Cluster analysis in web-based cognitive rehabilitation treatments enables the identification and characterization of strong response patterns to neuropsychological tests, external validation of the obtained clusters, tailoring of cognitive web-based tasks executed in the web platform to the identified profiles, thereby providing clinicians a tool for treatment personalization, and the extension of a similar approach to other medical conditions

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas

    An Integrative Analysis of microRNA and mRNA Expression—A Case Study

    Get PDF
    Background: MicroRNAs are believed to play an important role in gene expression regulation. They have been shown to be involved in cell cycle regulation and cancer. MicroRNA expression profiling became available owing to recent technology advancement. In some studies, both microRNA expression and mRNA expression are measured, which allows an integrated analysis of microRNA and mRNA expression.Results: We demonstrated three aspects of an integrated analysis of microRNA and mRNA expression, through a case study of human cancer data. We showed that (1) microRNA expression efficiently sorts tumors from normal tissues regardless of tumor type, while gene expression does not; (2) many microRNAs are down-regulated in tumors and these microRNAs can be clustered in two ways: microRNAs similarly affected by cancer and microRNAs similarly interacting with genes; (3) taking let-7f as an example, targets genes can be identified and they can be clustered based on their relationship with let-7f expression.Discussion: Our findings in this paper were made using novel applications of existing statistical methods: hierarchical clustering was applied with a new distance measure—the co-clustering frequency—to identify sample clusters that are stable; microRNA-gene correlation profiles were subject to hierarchical clustering to identify microRNAs that similarly interact with genes and hence are likely functionally related; the clustering of regression models method was applied to identify microRNAs similarly related to cancer while adjusting for tissue type and genes similarly related to microRNA while adjusting for disease status. These analytic methods are applicable to interrogate multiple types of -omics data in general
    corecore