33 research outputs found

    Clusterwise analysis for multiblock component methods

    Get PDF
    International audienceMultiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multi-block component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem-presented in this article-is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regres-B Stéphanie Bougeard 123 S. Bougeard et al. sion improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion-by means of a sequential algorithm-ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing

    Clusterwise methods, past and present

    Get PDF
    International audienceInstead of fitting a single and global model (regression, PCA, etc.) to a set of observations, clusterwise methods look simultaneously for a partition into k clusters and k local models optimizing some criterion. There are two main approaches: 1. the least squares approach introduced by E.Diday in the 70's, derived from k-means 2. mixture models using maximum likelihood but only the first one easily enables prediction. After a survey of classical methods, we will present recent extensions to functional, symbolic and multiblock data

    50 Years of Data Analysis: From Exploratory Data Analysis to Predictive Modeling and Machine Learning

    Get PDF
    International audienc

    Novel clustering methods for complex cluster structures in behavioral sciences

    Get PDF
    Large-scale data sets with a large number of variables become increasingly available in behavioral research. Encompassing a wide range of measurements and indicators, they provide behavioral scientists with unprecedented opportunities to synthesize different pieces of information so that novel - and sometimes subtle – subgroups (also called clusters) of populations can be identified. The successful detection of clusters is of great practical significance for a wide range of social and behavioral research topics. For example, in treating depressed patients, the first step in generating personalized recommendations is to accurately link the patients to the many subtypes of depression. In the organization context, it is highly problematic to assume that all leaders should follow the same developmental paths; in fact, tailoring training programs to the unique strengths of different leadership subgroups (e.g., the down-to-earth leaders and the excessively charismatic leaders) is always more effective than general developmental programs. When trying to understand the cognitive process underlying one’s voting behavior, once again, a one-size-fits-all approach likely produces erroneous descriptions. The broad social context as well as the surrounding environment in which a person grows up likely yields clusters of voters; only those belonging to the same cluster share a similar decision-making process for voting. To provide behavioral researchers with the best tool for accurately recovering the clusters hidden in large, complex data sets, this dissertation developed new statistical models and computational tools and implemented these novel approaches in publicly accessible software. Generally speaking, the novel methods developed here advance previous approaches by addressing the following three major challenges. First, as noise is ubiquitous in psychological measures, a considerable number of variables collected may be completely irrelevant to the hidden clusters. These irrelevant variables have to be completely and automatically filtered out during data analysis. Second, when integrating variables from diverse data sources (for example questionnaires and genetic information, GPS coordinates, social media footprints, etc.), it is desirable to capture both the unique characteristics pertaining to each data source and the shared or connected characteristics across the many data sources. Third, when translating data analytics results into substantive conclusions so as to inform critical decisions (e.g., medical decisions, personnel selection, etc.), effective and accurate communication is vital yet not necessarily easy to achieve. The two most prominent difficulties are communicating the confidence and (un)certainty in the clusters recovered and visualizing the results through very accessible graphs. With a variety of computer-simulated data and empirical behavioral data covering topics in clinical, social, personality, and organizational psychology, we were able to conclude that the various methods developed in the dissertation are more versatile, effective, and accurate in identifying subtle clusters in complex data sets, provide rich and unique insights in interpreting these clusters, and, thanks to the development of many software, can be readily accessed without many technical barriers. These methods are therefore useful for behavioral researchers to navigate in an increasingly digitized world and to recognize structures from massive information

    Older adults’ affective experiences across 100 days are less variable and less complex than younger adults’.

    Full text link
    Older adults are often described as being more emotionally competent than younger adults, and higher levels of affect complexity are seen as an indicator of this competence. We argue, however, that once age differences in affect variability are taken into account, older adults' everyday affective experiences will be characterized by lower affect complexity when compared with younger adults'. In addition, reduced affect complexity seems more likely from a theoretical point of view. We tested this hypothesis with a study in which younger and older adults reported their momentary affect on 100 days. Affect complexity was examined using clusterwise simultaneous component analysis based on covariance matrices to take into account differences in affect variability. We found that in the majority of older adults (55%), structures of affect were comparatively simpler than those of younger adults because they were reduced to a positive affect component. Most remaining older adults (35%) were characterized by differentiated rather than undifferentiated affective responding, as were a considerable number of younger adults (43%). When affect variability was made comparable across age groups, affect complexity also became comparable across age groups. It is interesting that individuals with the least complex structures had the highest levels of well-being. We conclude that affective experiences are not only less variable in the majority of older adults, but also less complex. Implications for understanding emotions across the life span are discussed

    How to detect which variables are causing differences in component structure among different groups

    Get PDF
    When comparing the component structures of a multitude of variables across different groups, the conclusion often is that the component structures are very similar in general and differ in a few variables only. Detecting such "outlying variables" is substantively interesting. Conversely, it can help to determine what is common across the groups. This article proposes and evaluates two formal detection heuristics to determine which variables are outlying, in a systematic and objective way. The heuristics are based on clusterwise simultaneous component analysis, which was recently presented as a useful tool for capturing the similarities and differences in component structures across groups. The heuristics are evaluated in a simulation study and illustrated using cross-cultural data on values

    Improving stacking methodology for combining classifiers: applications to cosmetic industry

    Get PDF
    International audienceStacking (Wolpert (1992), Breiman (1996)) is known to be a successful way of linearly combining several models. We modify the usual stacking methodology when the response is binary and predictions highly correlated,by combining predictions with PLS-Discriminant Analysis instead of ordinary least squares. For small data sets we develop a strategy based on repeated split samples in order to select relevant variables and ensure the robustness of the nal model. Five base (or level-0) classiers are combined in order to get an improved rule which is applied to a classical benchmark of UCI Machine Learning Repository. Our methodology is then applied to the prediction of dangerousness of 165 chemicals used in the cosmetic industry, described by 35 in vitro and in silico characteristics, since faced to safety constraints, one cannot rely on a single prediction method, especially when the sample sizeis low

    What's hampering measurement invariance:Detecting non-invariant items using clusterwise simultaneous component analysis

    Get PDF
    The issue of measurement invariance is ubiquitous in the behavioral sciences nowadays as more and more studies yield multivariate multigroup data. When measurement invariance cannot be established across groups, this is often due to different loadings on only a few items. Within the multigroup CFA framework, methods have been proposed to trace such non-invariant items, but these methods have some disadvantages in that they require researchers to run a multitude of analyses and in that they imply assumptions that are often questionable. In this paper, we propose an alternative strategy which builds on clusterwise simultaneous component analysis (SCA). Clusterwise SCA, being an exploratory technique, assigns the groups under study to a few clusters based on differences and similarities in the covariance matrices, and thus based on the component structure of the items. Non-invariant items can then be traced by comparing the cluster-specific component loadings via congruence coefficients, which is far more parsimonious than comparing the component structure of all separate groups. In this paper we present a heuristic for this procedure. Afterwards, one can return to the multigroup CFA framework and check whether removing the non-invariant items or removing some of the equality restrictions for these items, yields satisfactory invariance test results. An empirical application concerning cross-cultural emotion data is used to demonstrate that this novel approach is useful and can co-exist with the traditional CFA approaches

    Latent Class Probabilistic Latent Feature Analysis of Three-Way Three-Mode Binary Data

    Get PDF
    The analysis of binary three-way data (i.e., persons who indicate which attributes apply to each of a set of objects) may be of interest in several substantive domains as sensory profiling, marketing research or personality assessment. Latent class probabilistic latent feature models (LCPLFMs) may be used to explain binary object-attribute associations on the basis of a small number of binary latent variables (called latent features). As LCPLFMs aim to model object-attribute associations using a small number of latent features they may be more suited to analyze data with many objects/attributes than standard multilevel latent class models which do not include such a dimension reduction. In this paper we describe new functions of the plfm package for analyzing binary three-way data with LCPLFMs. The new functions provide a flexible modeling approach as they allow to (1) specify different assumptions for modeling statistical dependencies between object-attribute pairs, (2) use different assumptions for modeling parameter heterogeneity across persons, (3) conduct a confirmatory analysis by constraining specific parameters to pre-specified values, (4) inspect results with print, summary and plot methods. As an illustration, the models are applied to analyze data on the perception of midsize cars, and to study the situational determinants of anger-related behavior

    Mixture simultaneous factor analysis for capturing differences in latent variables between higher level units of multilevel data

    Get PDF
    Given multivariate data, many research questions pertain to the covariance structure: whether and how the variables (for example, personality measures) covary. Exploratory factor analysis (EFA) is often used to look for latent variables that may explain the covariances among variables; for example, the Big Five personality structure. In case of multilevel data, one may wonder whether or not the same covariance (factor) structure holds for each so-called ‘data block’ (containing data of one higher-level unit). For instance, is the Big Five personality structure found in each country or do cross-cultural differences exist? The well-known multigroup EFA framework falls short in answering such questions, especially for numerous groups/blocks. We introduce mixture simultaneous factor analysis (MSFA), performing a mixture model clustering of data blocks, based on their factor structure. A simulation study shows excellent results with respect to parameter recovery and an empirical example is included to illustrate the value of MSFA
    corecore