35,968 research outputs found

    KAMILA Clustering for a Mixed-Type Data Analysis of Illinois Medicare Data

    Get PDF
    The Centers for Medicare and Medicare Services (CMS) releases annual reports regarding the Market Saturation and Utilization of nationwide Medicare coverage. CMS data provide an opportunity for an in-depth analysis of Medicare usage patterns within the United States that may provide insight into socioeconomic conditions in certain regions. To discover any potential patterns, the KAMILA (KAy-means for MIxed LArge data sets) clustering algorithm has been utilized within the most recent CMS dataset from 2018. Due to the large size of the original data set, the focus of this research has been limited to Illinois Medicare data, grouped by the 102 counties in Illinois. The KAMILA algorithm extends the well-known k-means clustering algorithm to include mixed-type data by using a weighted semi-parametric procedure. Therefore, it balances the contribution of quantitative and qualitative variables. The optimal number of clusters is decided in-part by the operator of the algorithm with respect to the number of cross-validation runs. After the application of the KAMILA clustering algorithm with both the main CMS dataset and a modified version of it to exclude Cook County, two clusters were found with both datasets. This offers insight into the structure of Medicare Services in the state of Illinois

    Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects

    Get PDF
    Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli

    Single-tree detection in high-density LiDAR data from UAV-based survey

    Get PDF
    UAV-based LiDAR survey provides very-high-density point clouds, which involve very rich information about forest detailed structure, allowing for detection of individual trees, as well as demanding high computational load. Single-tree detection is of great interest for forest management and ecology purposes, and the task is relatively well solved for forests made of single or largely dominant species, and trees having a very evident pointed shape in the upper part of the canopy (in particular conifers). Most authors proposed methods based totally or partially on search of local maxima in the canopy, which has poor performance for species that have flat or irregular upper canopy, and for mixed forests, especially where taller trees hide smaller ones. Such considerations apply in particular to Mediterranean hardwood forests. In such context, it is imperative to use the whole volume of the point cloud, however keeping computational load tractable. The authors propose the use of a methodology based on modelling the 3D-shape of the tree, which improves performance w.r.t to maxima-based models. A case study, performed on a hazel grove, is provided to document performance improvement on a relatively simple, but significant, case

    A survey of popular R packages for cluster analysis

    Get PDF
    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.
    corecore