65,891 research outputs found

    Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm

    Full text link
    The identification of different homogeneous groups of observations and their appropriate analysis in PLS-SEM has become a critical issue in many appli- cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all units on which the model is estimated, and approaches of segmentation present in literature, consist in estimating separate models for each segments of statistical units, which have been obtained either by assigning the units to segments a priori defined. However, these approaches are not fully accept- able because no causal structure among the variables is postulated. In other words, a modeling approach should be used, where the obtained clusters are homogeneous with respect to the structural causal relationships. In this paper, a new methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is proposed. This methodology is motivated by the fact that the sequential approach of applying first SEM or PLS-SEM and second the clustering algorithm such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the correct clustering structure existing in the data. A simulation study and an application on real data are included to evaluate the performance of the proposed methodology

    Regional development assessment using parametric and non-parametric ranking methods: A comparative analysis of Slovenia and Croatia

    Get PDF
    In this paper we describe several regional development-assessment methods and subsequently apply them in a comparative development level analysis of the Slovenian and Croatian municipalities. The aim is to compare performance and suitability of several parametric and non-parametric ranking methods and to develop a suitable multivariate methodological framework for distinguishing development level of particular territorial units. However, the usefulness and appropriateness of various multivariate techniques for regional development assessment is generally questionable and there is no clear consensus about how to carry out such analysis. Two main methodological approaches are based on parametric and non-parametric methods, where in the former an explicit econometric model containing theory-implied causal and possibly simultaneous relationships is estimated using likelihood-based methods and formally assessed in terms of the goodness of fit and other test statistics, subsequently allowing for estimation of the development level on a metric scale, while in the later, territorial units or regions are essentially classified into clusters or groups differing in the development level, but no formal inferential methods are applied to confirm the validity of the model, or to establish the difference in the development level on a metric scale. The possible advantages of the first approach are in the existence of formal testing and evaluation procedures, as well as in producing interval ranks of the analysed units, while its disadvantages are in the lack of robustness; often unrealistic distributional assumptions; and possible invalidity of the theoretically implied causal relationships. In this paper we consider a parametric, inferential approach based on maximum likelihood estimation of the linear structural equation model with latent variables for metric-scale development ranking, and a non-parametric approach based on cluster analysis for development grouping. Our analysis is based on ten regional development variables such as income per capita, population density, age index, etc. which are similarly collected and generally compatible for both analysed countries. Within the parametric approach, a simultaneous equation econometric model is estimated and latent scores are computed for each underlying latent development variable, where three latent constructs are postulated corresponding to economic, structural and demographic development dimensions. In the non-parametric approach, a combination of Ward?s hierarchical method and K-means clustering procedure is applied to classify the territorial units. We apply both methodological frameworks to Slovenian and Croatian municipality data and assess their regional development level. We further compare the performance of both methods and show to which degree their results are compatible. Finally, we propose a unified framework based on both parametric and non-parametric methods, where clustering techniques are performed both on the original development indicators and on the computed latent scores from the structural equation model, and compare these results with the results from each of the two methods applied separately. We show that a combined parametric/non-parametric approach is superior to each approach applied individually and propose a methodological framework capable of estimating the development level of territorial units or regions on a metric scale, while in the same time preserving the robustness of the non-parametric techniques.

    Network Cosmology

    Full text link
    Prediction and control of the dynamics of complex networks is a central problem in network science. Structural and dynamical similarities of different real networks suggest that some universal laws might accurately describe the dynamics of these networks, albeit the nature and common origin of such laws remain elusive. Here we show that the causal network representing the large-scale structure of spacetime in our accelerating universe is a power-law graph with strong clustering, similar to many complex networks such as the Internet, social, or biological networks. We prove that this structural similarity is a consequence of the asymptotic equivalence between the large-scale growth dynamics of complex networks and causal networks. This equivalence suggests that unexpectedly similar laws govern the dynamics of complex networks and spacetime in the universe, with implications to network science and cosmology

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

    The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

    Full text link
    In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data
    • 

    corecore