276 research outputs found
Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm
The identification of different homogeneous groups of observations and their
appropriate analysis in PLS-SEM has become a critical issue in many appli-
cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all
units on which the model is estimated, and approaches of segmentation present
in literature, consist in estimating separate models for each segments of
statistical units, which have been obtained either by assigning the units to
segments a priori defined. However, these approaches are not fully accept- able
because no causal structure among the variables is postulated. In other words,
a modeling approach should be used, where the obtained clusters are homogeneous
with respect to the structural causal relationships. In this paper, a new
methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is
proposed. This methodology is motivated by the fact that the sequential
approach of applying first SEM or PLS-SEM and second the clustering algorithm
such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the
correct clustering structure existing in the data. A simulation study and an
application on real data are included to evaluate the performance of the
proposed methodology
Partitioning predictors in multivariate regression models
A Multivariate Regression Model Based on the Optimal Partition of Predictors (MRBOP) useful in applications in the presence of strongly correlated predictors is presented. Such classes of predictors are synthesized by latent factors, which are obtained through an appropriate linear combination of the original variables and are forced to be weakly correlated. Specifically, the proposed model assumes that the latent factors are determined by subsets of predictors characterizing only one latent factor. MRBOP is formalized in a least squares framework optimizing a penalized quadratic objective function through an alternating least-squares (ALS) algorithm. The performance of the methodology is evaluated on simulated and real data sets. © 2013 Springer Science+Business Media New York
A composite indicator via hierarchical disjoint factor analysis for measuring the Italian football teams’ performances
In the last years, with the data revolution and the use of new technologies, phenomena are frequently described by a huge quantity of information useful for making strategical decisions. In the current ”big data” era, the interest of statistics into sports is increasing over the years, sportive and economic data are collected for all teams which use statistical analysis in order to improve their performances.
For dealing with all this amount of information, an appropriate statistical analysis is needed. A priority is having statistical tools useful to synthesise the information arised from the data. Such tools are represented by composite indicators, that is, non-observable latent variables and linear combination of observed variables. The strategy of construction of a composite indicator used in this paper is based on a non-negative disjoint and hierarchical model for a set of quantitative variables. This is a factor model with a hierarchical struc- ture formed by factors associated to subsets of manifest variables with positive loadings.
In this paper, a composite indicator for measuring the Italian football teams’ performances, in terms of sportive and economic variables, is proposed
Multi-mode partitioning for text clustering to reduce dimensionality and noises
Co-clustering in text mining has been proposed to partition words and documents simultaneously. Although the
main advantage of this approach may improve interpretation of clusters on the data, there are still few proposals
on these methods; while one-way partition is even now widely utilized for information retrieval. In contrast to
structured information, textual data suffer of high dimensionality and sparse matrices, so it is strictly necessary
to pre-process texts for applying clustering techniques. In this paper, we propose a new procedure to reduce high
dimensionality of corpora and to remove the noises from the unstructured data. We test two different processes
to treat data applying two co-clustering algorithms; based on the results we present the procedure that provides
the best interpretation of the data
Exploring drug consumption via an ultrametric correlation matrix
In molte applicazioni l’ipotesi dell’esistenza di un concetto generale (un fenomeno multidimensionale), definito mediante concetti più specifici, è spesso avvalorata. In letteratura, molteplici metodologie di tipo sequenziale sono state proposte con lo scopo di identificare una gerarchica di dimensioni latenti. In questo articolo indaghiamo il fenomeno del consumo di droghe mediante una matrice di correlazione ultrametrica, che permette di individuare diversi, disgiunti gruppi di droghe e le loro relazioni gerarchiche, a partire dalla matrice di correlazione dei dati osservati. Data la sua rilevanza sociale ed economica, un approccio basato su modello per lo studio del consumo di droghe può fornire una conoscenza più approfondita di tale fenomeno, che a sua volta può risultare fondamentale nella definizione di politiche volte alla sua riduzione.In many real applications, the existence of a general concept (a multidimensional phenomenon) composed of nested specific ones is often theorised. In the specialised literature, different sequential methodologies have been proposed to identify a hierarchy of latent dimensions. In this paper, we investigate drug consumption via an ultrametric correlation matrix which allows to detect different, nonoverlapping groups of drugs and their hierarchical relationships, starting from the correlation matrix of the observed data. Since its social and economic relevance, a model-based approach to drug consumption can provide an in-depth understanding of this challenging phenomenon, which turns out to be fundamental to address policies aimed at reducing it
Multiple Correspondence K-Means: Simultaneous Versus Sequential Approach for Dimension Reduction and Clustering
In this work, a discrete model for clustering and a continuous factorial one for dimension reduction are simultaneously fitted to categorical data, with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the factors, according to the least-squares criterion. This new methodology named multiple correspondence k-means is a useful alternative to the Tandem Analysis in the case of categorical data. Then, this approach has a double objective: data reduction and synthesis, simultaneously in the direction of rows and columns of the data matrix
Completed suicide during pregnancy and postpartum
Both pregnancy and the postpartum are typical periods for the onset or relapse of psychiatric symptoms and disorders, with depression and anxiety being the most common. The prevalence of suicide spectrum behaviour is significantly higher among women with a diagnosis of depressive or bipolar disorder. Suicide during pregnancy and postpartum is a multifactorial phenomenon and a history of psychiatric illness is only one of the possible risk factors involved in suicide spectrum behaviour. The present paper highlights the importance of a complete screening for both depression and suicide risk during peripartum
A composite indicator for the waste management in the EU via Hierarchical Disjoint Non-Negative Factor Analysis
In the last years, the quantity of information and statistics about waste management are more and more consistent but so far, few studies are available in this field. The goal of this paper is of producing a model-based Composite Indicator of "good" Waste Management, in order to provide a useful tool of support for EU countries' policy-makers and institutions.
Composite Indicators (CIs), usually, are multidimensional concepts with a hierarchical structure characterized by the presence of a set of specific dimensions, each one corresponding to a subsets of manifest variables. Thus, we propose a CI for Waste Management in Europe by using a hierarchical model-based approach with positive loadings. This approach guarantees to comply with all the good properties on which a composite indicator should be based and to detect the main dimensions (i.e., aspects) of the Waste Management phenomenon.
In other terms, this paper provides a hierarchically aggregated index that best describes the Waste Management in EU with its main features by identifying the most important high order (i.e., hierarchical) relationships among subsets of manifest variables. All the parameters are estimated according to the maximum likelihood estimation method (MLE) in order to make inference on the parameters and on the validity of the model
Model-based clustering with parsimonious covariance structure
Complex multidimensional concepts are often explained by a tree-shape structure by considering nested partitions of variables, where each variable group is associated with a specific concept. Recalling that relations among variables can be detected by their covariance matrix, this paper introduces a covariance structure that reconstructs hierarchical relationships among variables highlighting three features of the variable groups. We finally present an application of the latter covariance structure to the model-based clustering
- …