276 research outputs found

    Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm

    Full text link
    The identification of different homogeneous groups of observations and their appropriate analysis in PLS-SEM has become a critical issue in many appli- cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all units on which the model is estimated, and approaches of segmentation present in literature, consist in estimating separate models for each segments of statistical units, which have been obtained either by assigning the units to segments a priori defined. However, these approaches are not fully accept- able because no causal structure among the variables is postulated. In other words, a modeling approach should be used, where the obtained clusters are homogeneous with respect to the structural causal relationships. In this paper, a new methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is proposed. This methodology is motivated by the fact that the sequential approach of applying first SEM or PLS-SEM and second the clustering algorithm such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the correct clustering structure existing in the data. A simulation study and an application on real data are included to evaluate the performance of the proposed methodology

    Partitioning predictors in multivariate regression models

    Get PDF
    A Multivariate Regression Model Based on the Optimal Partition of Predictors (MRBOP) useful in applications in the presence of strongly correlated predictors is presented. Such classes of predictors are synthesized by latent factors, which are obtained through an appropriate linear combination of the original variables and are forced to be weakly correlated. Specifically, the proposed model assumes that the latent factors are determined by subsets of predictors characterizing only one latent factor. MRBOP is formalized in a least squares framework optimizing a penalized quadratic objective function through an alternating least-squares (ALS) algorithm. The performance of the methodology is evaluated on simulated and real data sets. © 2013 Springer Science+Business Media New York

    A composite indicator via hierarchical disjoint factor analysis for measuring the Italian football teams’ performances

    Get PDF
    In the last years, with the data revolution and the use of new technologies, phenomena are frequently described by a huge quantity of information useful for making strategical decisions. In the current ”big data” era, the interest of statistics into sports is increasing over the years, sportive and economic data are collected for all teams which use statistical analysis in order to improve their performances. For dealing with all this amount of information, an appropriate statistical analysis is needed. A priority is having statistical tools useful to synthesise the information arised from the data. Such tools are represented by composite indicators, that is, non-observable latent variables and linear combination of observed variables. The strategy of construction of a composite indicator used in this paper is based on a non-negative disjoint and hierarchical model for a set of quantitative variables. This is a factor model with a hierarchical struc- ture formed by factors associated to subsets of manifest variables with positive loadings. In this paper, a composite indicator for measuring the Italian football teams’ performances, in terms of sportive and economic variables, is proposed

    Multi-mode partitioning for text clustering to reduce dimensionality and noises

    Get PDF
    Co-clustering in text mining has been proposed to partition words and documents simultaneously. Although the main advantage of this approach may improve interpretation of clusters on the data, there are still few proposals on these methods; while one-way partition is even now widely utilized for information retrieval. In contrast to structured information, textual data suffer of high dimensionality and sparse matrices, so it is strictly necessary to pre-process texts for applying clustering techniques. In this paper, we propose a new procedure to reduce high dimensionality of corpora and to remove the noises from the unstructured data. We test two different processes to treat data applying two co-clustering algorithms; based on the results we present the procedure that provides the best interpretation of the data

    Exploring drug consumption via an ultrametric correlation matrix

    Get PDF
    In molte applicazioni l’ipotesi dell’esistenza di un concetto generale (un fenomeno multidimensionale), definito mediante concetti più specifici, è spesso avvalorata. In letteratura, molteplici metodologie di tipo sequenziale sono state proposte con lo scopo di identificare una gerarchica di dimensioni latenti. In questo articolo indaghiamo il fenomeno del consumo di droghe mediante una matrice di correlazione ultrametrica, che permette di individuare diversi, disgiunti gruppi di droghe e le loro relazioni gerarchiche, a partire dalla matrice di correlazione dei dati osservati. Data la sua rilevanza sociale ed economica, un approccio basato su modello per lo studio del consumo di droghe può fornire una conoscenza più approfondita di tale fenomeno, che a sua volta può risultare fondamentale nella definizione di politiche volte alla sua riduzione.In many real applications, the existence of a general concept (a multidimensional phenomenon) composed of nested specific ones is often theorised. In the specialised literature, different sequential methodologies have been proposed to identify a hierarchy of latent dimensions. In this paper, we investigate drug consumption via an ultrametric correlation matrix which allows to detect different, nonoverlapping groups of drugs and their hierarchical relationships, starting from the correlation matrix of the observed data. Since its social and economic relevance, a model-based approach to drug consumption can provide an in-depth understanding of this challenging phenomenon, which turns out to be fundamental to address policies aimed at reducing it

    Multiple Correspondence K-Means: Simultaneous Versus Sequential Approach for Dimension Reduction and Clustering

    Get PDF
    In this work, a discrete model for clustering and a continuous factorial one for dimension reduction are simultaneously fitted to categorical data, with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the factors, according to the least-squares criterion. This new methodology named multiple correspondence k-means is a useful alternative to the Tandem Analysis in the case of categorical data. Then, this approach has a double objective: data reduction and synthesis, simultaneously in the direction of rows and columns of the data matrix

    Completed suicide during pregnancy and postpartum

    Get PDF
    Both pregnancy and the postpartum are typical periods for the onset or relapse of psychiatric symptoms and disorders, with depression and anxiety being the most common. The prevalence of suicide spectrum behaviour is significantly higher among women with a diagnosis of depressive or bipolar disorder. Suicide during pregnancy and postpartum is a multifactorial phenomenon and a history of psychiatric illness is only one of the possible risk factors involved in suicide spectrum behaviour. The present paper highlights the importance of a complete screening for both depression and suicide risk during peripartum

    A composite indicator for the waste management in the EU via Hierarchical Disjoint Non-Negative Factor Analysis

    Get PDF
    In the last years, the quantity of information and statistics about waste management are more and more consistent but so far, few studies are available in this field. The goal of this paper is of producing a model-based Composite Indicator of "good" Waste Management, in order to provide a useful tool of support for EU countries' policy-makers and institutions. Composite Indicators (CIs), usually, are multidimensional concepts with a hierarchical structure characterized by the presence of a set of specific dimensions, each one corresponding to a subsets of manifest variables. Thus, we propose a CI for Waste Management in Europe by using a hierarchical model-based approach with positive loadings. This approach guarantees to comply with all the good properties on which a composite indicator should be based and to detect the main dimensions (i.e., aspects) of the Waste Management phenomenon. In other terms, this paper provides a hierarchically aggregated index that best describes the Waste Management in EU with its main features by identifying the most important high order (i.e., hierarchical) relationships among subsets of manifest variables. All the parameters are estimated according to the maximum likelihood estimation method (MLE) in order to make inference on the parameters and on the validity of the model

    Model-based clustering with parsimonious covariance structure

    Get PDF
    Complex multidimensional concepts are often explained by a tree-shape structure by considering nested partitions of variables, where each variable group is associated with a specific concept. Recalling that relations among variables can be detected by their covariance matrix, this paper introduces a covariance structure that reconstructs hierarchical relationships among variables highlighting three features of the variable groups. We finally present an application of the latter covariance structure to the model-based clustering
    • …