8 research outputs found

    Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins

    Get PDF
    International audienceGaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization (GEM) algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions

    Modello multilevel a classi latenti: estensione al modello multidimensionale

    Get PDF

    Simultaneous clustering with mixtures of factor analysers

    Get PDF
    This work details the method of Simultaneous Model-based Clustering. It also presents an extension to this method by reformulating it as a model with a mixture of factor analysers. This allows for the technique, known as Simultaneous Model-Based Clustering with a Mixture of Factor Analysers, to be able to cluster high dimensional gene-expression data. A new table of allowable and non-allowable models is formulated, along with a parameter estimation scheme for one such allowable model. Several numerical procedures are tested and various datasets, both real and generated, are clustered. The results of clustering the Iris data find a 3 component VEV model to have the lowest misclassification rate with comparable BIC values to the best scoring model. The clustering of Genetic data was less successful, where the 2-component model could successfully uncover the healthy tissue, but partitioned the cancerous tissue in half

    Multilevel mixed-type data analysis for validating partitions of scrapie isolates

    Get PDF
    The dissertation arises from a joint study with the Department of Food Safety and Veterinary Public Health of the Istituto Superiore di Sanità. The aim is to investigate and validate the existence of distinct strains of the scrapie disease taking into account the availability of a priori benchmark partition formulated by researchers. Scrapie of small ruminants is caused by prions, which are unconventional infectious agents of proteinaceous nature a ecting humans and animals. Due to the absence of nucleic acids, which precludes direct analysis of strain variation by molecular methods, the presence of di erent sheep scrapie strains is usually investigated by bioassay in laboratory rodents. Data are collected by an experimental study on scrapie conducted at the Istituto Superiore di Sanità by experimental transmission of scrapie isolates to bank voles. We aim to discuss the validation of a given partition in a statistical classification framework using a multi-step procedure. Firstly, we use unsupervised classification to see how alternative clustering results match researchers’ understanding of the heterogeneity of the isolates. We discuss whether and how clustering results can be eventually exploited to extend the preliminary partition elicited by researchers. Then we motivate the subsequent partition validation based on the predictive performance of several supervised classifiers. Our data-driven approach contains two main methodological original contributions. We advocate the use of partition validation measures to investigate a given benchmark partition: firstly we discuss the issue of how the data can be used to evaluate a preliminary benchmark partition and eventually modify it with statistical results to find a conclusive partition that could be used as a “gold standard” in future studies. Moreover, collected data have a multilevel structure and for each lower-level unit, mixed-type data are available. Each step in the procedure is then adapted to deal with multilevel mixed-type data. We extend distance-based clustering algorithms to deal with multilevel mixed-type data. Whereas in supervised classification we propose a two-step approach to classify the higher-level units starting from the lower-level observations. In this framework, we also need to define an ad-hoc cross validation algorithm

    Advertising to low-income consumers: portrayals of women in Drum magazine advertisements 1981-2010

    Get PDF
    This research examines the portrayal of women as message sources in advertisements appearing in Drum magazine 1981-2010, an important time period that captures South Africa's transition from Apartheid rule to a time when the equality of women has been recognised more formally

    PercepçÔes das causas da pobreza na europa: uma aplicação de modelos multinível com classes latentes

    Get PDF
    A percepção das causas da pobreza tem-se revelado muito importante no estudo mais amplo deste fenĂłmeno social, dadas as implicaçÔes directas na interacção social com os pobres e, indirectamente, na legitimação ou nĂŁo de medidas de apoio social e de combate Ă  pobreza. Foram considerados trĂȘs tipos de atribuiçÔes da pobreza: individualista, estruturalista e fatalista. Os dados foram recolhidos pelo EurobarĂłmetro 2007, recolhendo-se informação dos 27 estados membros da UniĂŁo Europeia e da CroĂĄcia, paĂ­s candidato. Dada a importĂąncia da cultura na formação das atitudes, estimaram-se modelos multinĂ­vel com classes latentes, possibilitando estudar em simultĂąneo dois nĂ­veis de anĂĄlise: o nĂ­vel dos indivĂ­duos, onde se conhece o seu perfil dentro de cada paĂ­s relativamente Ă s suas percepçÔes da pobreza; e o nĂ­vel dos paĂ­ses onde se percebem quais as semelhanças e diferenças entre os paĂ­ses europeus neste Ăąmbito. A estrutura encontrada possui seis segmentos de paĂ­ses e sete segmentos de indivĂ­duos. Apesar das causas sociais serem globalmente as mais apontadas pelos indivĂ­duos, existem grupos que destacam igualmente algumas explicaçÔes mais individualistas, culpabilizando os pobres pela sua condição. A variĂĄvel que parece ter mais impacto na distinção entre os grupos de indivĂ­duos Ă© o nĂ­vel socioeconĂłmico mencionado: os indivĂ­duos com mais dificuldades econĂłmicas atribuem Ă  pobreza causas mais sociais do que os com melhor condição econĂłmico-social. Ao nĂ­vel dos paĂ­ses, o segmento dos mais desenvolvidos atribui Ă  pobreza causas individualistas e fatalistas e os menos desenvolvidos explicam a pobreza com base nas injustiças da sociedade. Existem grupos de paĂ­ses que demonstram pluralidade na forma como pensam sobre a pobreza.The perception of the poverty causes has been very important in the broader study of this social phenomenon, given the direct implications in social interaction with the poor and, indirectly, in the legitimacy of measures of social support and combat against poverty. This study considers three types of poverty attributions: individualistic, structuralistic, and fatalistic. Data were collected by Eurobarometer in 2007, in all 27 EU member states and Croatia, as a candidate country. Given the importance of culture in attitudes shaping, multilevel latent class models were estimated, allowing the simultaneous study of two levels of analysis: the individuals‟ level, where it is possible to define the profile within each country regarding their perceptions of poverty, and the country level where it is possible to identify the similarities and differences between European countries, in this field. The model structure has six segments of countries and seven segments of individuals. Despite the generalization of the social explanations of poverty, that are the most quoted causes by individuals, there are also groups that emphasize more individualistic explanations, blaming the poor for their condition. The variable that seems to have more impact on the distinction between groups of individuals is their socio-economic condition: individuals with more economic problems attributed more social causes to poverty than those in a better economic and social situation. At the country level, the most developed segment believes in the individualistic and fatalistic causes of poverty and the less developed explains poverty based on the injustices of society. There are groups of countries that demonstrate diversity in how they think about poverty

    Hierarchical mixture models for nested data structures

    Get PDF
    Abstract. A hierarchical extension of the finite mixture model is presented that can be used for the analysis of nested data structures. The model permits a simultaneous model-based clustering of lower- and higher-level units. Lower-level observations within higher-level units are assumed to be mutually independent given cluster membership of the higher-level units. The proposed model can be seen as a finite mixture model in which the prior class membership probabilities are assumed to be random, which makes it very similar to the grade-of-membership (GoM) model. The new model is illustrated with an example from organizational psychology.

    Hierarchical mixture models for nested data structures

    No full text
    corecore