708 research outputs found

    Multiple imputation of missing categorical data using latent class models:State of art

    Get PDF
    This paper provides an overview of recent proposals for using latent class models for the multiple imputation of missing categorical data in large-scale studies. While latent class (or finite mixture) modeling is mainly known as a clustering tool, it can also be used for density estimation, i.e., to get a good description of the lower- and higher-order associations among the variables in a dataset. For multiple imputation, the latter aspect is essential in order to be able to draw meaningful imputing values from the conditional distribution of the missing data given the observed data. We explain the general logic underlying the use of latent class analysis for multiple imputation. Moreover, we present several variants developed within either a frequentist or a Bayesian framework, each of which overcomes certain limitations of the standard implementation. The different approaches are illustrated and compared using a real-data psychological assessment application

    Latent class trees

    Get PDF

    A survey of popular R packages for cluster analysis

    Get PDF
    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.

    Mixed Statistical Matching Approaches Using a Latent Class Model: Simulation Studies

    Get PDF
    In the era of data revolution, availability and presence of data is a huge wealth that has to be utilized. Instead of making new surveys, benefit can be made from data that already exists. As, enormous amounts of data become available, it is becoming essential to undertake research that involves integrating data from multiple sources in order to make the best use out of it. Statistical Data Integration (SDI) is the statistical tool for considering this issue. SDI can be used to integrate data files that have common units, and it also allows to merge unrelated files that do not share any common units, depending on the input data. The convenient method of data integration is determined according to the nature of the input data. SDI has two main methods, Record Linkage (RL) and Statistical Matching (SM). SM techniques typically aim to achieve a complete data file from different sources which do not contain the same units. There are a number of traditional matching techniques mentioned in the literature. Among these techniques, there are various approaches for continuous data, but not as many methods for categorical data. This paper proposes a Statistical Matching technique for categorical data based on latent class models within a Bayesian framework. Dirichlet Process Mixture of Product of Multinomial distributions model is used in Statistical Matching throughout this paper which is a fully Bayesian estimation method for latent class models. Performance of the proposed latent class model used for Statistical Matching is evaluated using an empirical comparison with several existing matching procedures based on simulation studies

    APPLICATION OF CLUSTER ANALYSIS IN THE BEHAVIOUR OF TRAFFIC PARTICIPANTS RELATING TO THE USE OF SAFETY SYSTEMS AND MOBILE PHONES

    Get PDF
    This paper presents a cluster analysis related to the behavior of traffic participants in relation to the use of safety systems and mobile phones. The data on traffic behavior were downloaded from an open data portal in Serbia. Three types of cluster analysis have been applied: hierarchical clustering, Bayesian Information Criterion (BIC) clustering and model clustering. The obtained results point to the various possibilities of using these three clustering methods in the field of traffic and suggest further research

    Latent class trees with the three-step approach

    Get PDF
    Latent class (LC) analysis is widely used in the social and behavioral sciences to find meaningful clusters based on a set of categorical variables. To deal with the common problem that a standard LC analysis may yield a large number classes and thus a solution that is difficult to interpret, recently an alternative approach has been proposed, called Latent Class Tree (LCT) analysis. It involves starting with a solution with a small number of "basic" classes, which may subsequently be split into subclasses at the next stages of an analysis. However, in most LC analysis applications, we not only wish to identify the relevant classes, but also want to see how they relate to external variables (covariates or distal outcomes). For this purpose, researchers nowadays prefer using the bias-adjusted three-step method. Here, we show how this bias-adjusted three-step procedure can be applied in the context of LCT modeling. More specifically, an R-package is presented that performs a three-step LCT analysis: it builds a LCT and allows checking how splits are related to the relevant external variables. The new tool is illustrated using a cross-sectional application with multiple indicators on social capital and demographics as external variables and with a longitudinal application with a mood variable measured multiple times during the day and personality traits as external variables
    corecore