8 research outputs found
Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins
International audienceGaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization (GEM) algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions
Simultaneous clustering with mixtures of factor analysers
This work details the method of Simultaneous Model-based Clustering. It also presents an extension to this method by reformulating it as a model with a mixture of factor analysers. This allows for the technique, known as Simultaneous Model-Based Clustering with a Mixture of Factor Analysers, to be able to cluster high dimensional gene-expression data. A new table of allowable and non-allowable models is formulated, along with a parameter estimation scheme for one such allowable model. Several numerical procedures are tested and various datasets, both real and generated, are clustered. The results of clustering the Iris data find a 3 component VEV model to have the lowest misclassification rate with comparable BIC values to the best scoring model. The clustering of Genetic data was less successful, where the 2-component model could successfully uncover the healthy tissue, but partitioned the cancerous tissue in half
Multilevel mixed-type data analysis for validating partitions of scrapie isolates
The dissertation arises from a joint study with the Department of Food Safety and Veterinary Public Health of the Istituto Superiore di SanitĂ . The aim is to investigate and validate the existence of distinct strains of the scrapie disease taking into account the availability of a priori benchmark partition formulated by researchers. Scrapie of small ruminants is caused by prions, which are unconventional infectious agents of proteinaceous nature a ecting humans and animals. Due to the absence of nucleic acids, which precludes direct analysis of strain variation by molecular methods, the presence of di erent sheep scrapie strains is usually investigated by bioassay in laboratory rodents. Data are collected by an experimental study on scrapie conducted at the Istituto Superiore di SanitĂ by experimental transmission of scrapie isolates to bank voles.
We aim to discuss the validation of a given partition in a statistical classification framework using a multi-step procedure. Firstly, we use unsupervised classification to see how alternative clustering results match researchersâ understanding of the heterogeneity of the isolates. We discuss whether and how clustering results can be eventually exploited to extend the preliminary partition elicited by researchers. Then we motivate the subsequent partition validation based on the predictive performance of several supervised classifiers.
Our data-driven approach contains two main methodological original contributions. We advocate the use of partition validation measures to investigate a given benchmark partition: firstly we discuss the issue of how the data can be used to evaluate a preliminary benchmark partition and eventually modify it with statistical results to find a conclusive partition that could be used as a âgold standardâ in future studies. Moreover, collected data have a multilevel structure and for each lower-level unit, mixed-type data are available. Each step in the procedure is then adapted to deal with multilevel mixed-type data. We extend distance-based clustering algorithms to deal with multilevel mixed-type data. Whereas in supervised classification we propose a two-step approach to classify the higher-level units starting from the lower-level observations. In this framework, we also need to define an ad-hoc cross validation algorithm
Advertising to low-income consumers: portrayals of women in Drum magazine advertisements 1981-2010
This research examines the portrayal of women as message sources in advertisements appearing in Drum magazine 1981-2010, an important time period that captures South Africa's transition from Apartheid rule to a time when the equality of women has been recognised more formally
PercepçÔes das causas da pobreza na europa: uma aplicação de modelos multinĂvel com classes latentes
A percepção das causas da pobreza tem-se revelado muito importante no estudo mais amplo
deste fenómeno social, dadas as implicaçÔes directas na interacção social com os pobres e,
indirectamente, na legitimação ou não de medidas de apoio social e de combate à pobreza.
Foram considerados trĂȘs tipos de atribuiçÔes da pobreza: individualista, estruturalista e
fatalista. Os dados foram recolhidos pelo Eurobarómetro 2007, recolhendo-se informação dos
27 estados membros da UniĂŁo Europeia e da CroĂĄcia, paĂs candidato. Dada a importĂąncia da
cultura na formação das atitudes, estimaram-se modelos multinĂvel com classes latentes,
possibilitando estudar em simultĂąneo dois nĂveis de anĂĄlise: o nĂvel dos indivĂduos, onde se
conhece o seu perfil dentro de cada paĂs relativamente Ă s suas percepçÔes da pobreza; e o
nĂvel dos paĂses onde se percebem quais as semelhanças e diferenças entre os paĂses europeus
neste Ăąmbito. A estrutura encontrada possui seis segmentos de paĂses e sete segmentos de
indivĂduos. Apesar das causas sociais serem globalmente as mais apontadas pelos indivĂduos,
existem grupos que destacam igualmente algumas explicaçÔes mais individualistas,
culpabilizando os pobres pela sua condição. A variåvel que parece ter mais impacto na
distinção entre os grupos de indivĂduos Ă© o nĂvel socioeconĂłmico mencionado: os indivĂduos
com mais dificuldades econĂłmicas atribuem Ă pobreza causas mais sociais do que os com
melhor condição econĂłmico-social. Ao nĂvel dos paĂses, o segmento dos mais desenvolvidos
atribui Ă pobreza causas individualistas e fatalistas e os menos desenvolvidos explicam a
pobreza com base nas injustiças da sociedade. Existem grupos de paĂses que demonstram
pluralidade na forma como pensam sobre a pobreza.The perception of the poverty causes has been very important in the broader study of this
social phenomenon, given the direct implications in social interaction with the poor and,
indirectly, in the legitimacy of measures of social support and combat against poverty. This
study considers three types of poverty attributions: individualistic, structuralistic, and
fatalistic. Data were collected by Eurobarometer in 2007, in all 27 EU member states and
Croatia, as a candidate country. Given the importance of culture in attitudes shaping,
multilevel latent class models were estimated, allowing the simultaneous study of two levels
of analysis: the individualsâ level, where it is possible to define the profile within each
country regarding their perceptions of poverty, and the country level where it is possible to
identify the similarities and differences between European countries, in this field. The model
structure has six segments of countries and seven segments of individuals. Despite the
generalization of the social explanations of poverty, that are the most quoted causes by
individuals, there are also groups that emphasize more individualistic explanations, blaming
the poor for their condition. The variable that seems to have more impact on the distinction
between groups of individuals is their socio-economic condition: individuals with more
economic problems attributed more social causes to poverty than those in a better economic
and social situation. At the country level, the most developed segment believes in the
individualistic and fatalistic causes of poverty and the less developed explains poverty based
on the injustices of society. There are groups of countries that demonstrate diversity in how
they think about poverty
Hierarchical mixture models for nested data structures
Abstract. A hierarchical extension of the finite mixture model is presented that can be used for the analysis of nested data structures. The model permits a simultaneous model-based clustering of lower- and higher-level units. Lower-level observations within higher-level units are assumed to be mutually independent given cluster membership of the higher-level units. The proposed model can be seen as a finite mixture model in which the prior class membership probabilities are assumed to be random, which makes it very similar to the grade-of-membership (GoM) model. The new model is illustrated with an example from organizational psychology.