199 research outputs found

    From explained variance of correlated components to PCA without orthogonality constraints

    Full text link
    Block Principal Component Analysis (Block PCA) of a data matrix A, where loadings Z are determined by maximization of AZ 2 over unit norm orthogonal loadings, is difficult to use for the design of sparse PCA by 1 regularization, due to the difficulty of taking care of both the orthogonality constraint on loadings and the non differentiable 1 penalty. Our objective in this paper is to relax the orthogonality constraint on loadings by introducing new objective functions expvar(Y) which measure the part of the variance of the data matrix A explained by correlated components Y = AZ. So we propose first a comprehensive study of mathematical and numerical properties of expvar(Y) for two existing definitions Zou et al. [2006], Shen and Huang [2008] and four new definitions. Then we show that only two of these explained variance are fit to use as objective function in block PCA formulations for A rid of orthogonality constraints

    ClustGeo: an R package for hierarchical clustering with spatial constraints

    Get PDF
    In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices D0D_0 and D1D_1 are inputted, along with a mixing parameter α∈[0,1]\alpha \in [0,1]. The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the "feature space" and the second matrix gives the dissimilarities in the "constraint space". The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with D0D_0 and the homogeneity criterion calculated with D1D_1. The idea is then to determine a value of α\alpha which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo

    Multivariate Analysis of Mixed Data: The R Package PCAmixdata

    Get PDF
    Mixed data arise when observations are described by a mixture of numerical and categorical variables. The R package PCAmixdata extends standard multivariate analysis methods to incorporate this type of data. The key techniques/methods included in the package are principal component analysis for mixed data (PCAmix), varimax-like orthogonal rotation for PCAmix, and multiple factor analysis for mixed multi-table data. This paper gives a synthetic presentation of the three algorithms with details to help the user understand graphical and numerical outputs of the corresponding R functions. The three main methods are illustrated on a real dataset composed of four data tables characterizing living conditions in different municipalities in the Gironde region of southwest France

    Approche bloc en ACP group-sparse: le package sparsePCA

    Get PDF
    International audienc

    A semiparametric approach for a multivariate sample selection model

    Get PDF
    International audienceMost of the common estimation methods for sample selection models rely heavily on parametric and normality assumptions. We consider in this paper a multivariate semiparametric sample selection model and develop a geometric approach to the estimation of the slope vectors in the outcome equation and in the selection equation. Contrary to most existing methods, we deal symmetrically with both slope vectors. Moreover, the estimation method is link-free and distributionfree. It works in two main steps: a multivariate sliced inverse regression step, and a canonical analysis step. We establish pn-consistency and asymptotic normality of the estimates. We describe how to estimate the observation and selection link functions. The theory is illustrated with a simulation study

    Classification

    No full text
    National audienceLa classification a pour objet de regrouper des données en classes possédant des caractéristiques similaires. La classification peut être supervisée lorsque l'on dispose d'un ensemble d'apprentissage labellisé, semi-supervisée ou non supervisée. Elle apparaît dans de nombreuses applications telles que la fouille de texte, la reconnaissance vocale ou l'analyse de données génomiques. L'objectif de cette session est d'offrir un panorama des approches statistiques pour la classification de données (modèles de mélange, SVM, processus de Dirichlet, etc.) et d'en présenter diverses applications

    On central tendency and dispersion measures for intervals and hypercubes

    Get PDF
    The uncertainty or the variability of the data may be treated by considering, rather than a single value for each data, the interval of values in which it may fall. This paper studies the derivation of basic description statistics for interval-valued datasets. We propose a geometrical approach in the determination of summary statistics (central tendency and dispersion measures) for interval-valued variables

    Les effets de l'adoption obligatoire des normes IFRS sur les incorporels : le cas de la France.

    Get PDF
    International audienceCet article examine les effets de l'adoption obligatoire des IFRS sur les incorporels, dans le contexte français. Utilisant un échantillon de 83 entreprises issues du SBF 120, nous recherchons une typologie des pratiques comptables liées aux incorporels à la période de transition aux IFRS. Les résultats font ressortir trois classes d'entreprises affectées différemment par le passage aux normes internationales. La première classe est caractérisée par un changement important avec une forte augmentation du goodwill liée au retraitement d'immobilisations incorporelles comme les parts de marché. Elle permet d'illustrer la spécificité de la réglementation française. La deuxième classe se caractérise par une stabilité s'expliquant par le poids prédominant du goodwill sous référentiel français. Enfin la troisième classe ne subit pas non plus de changement compte tenu de la présence de marques en normes françaises. Le phénomène d'inertie décrit par Nobes (2006) selon lequel les traitements comptables pré-IFRS pourraient perdurer sous IFRS est vérifié

    L'adoption en France des normes IFRS relatives aux incorporels : bouleversement des pratiques ou inertie ?

    Get PDF
    Cet article examine l'adoption obligatoire en France des normes IFRS relatives aux incorporels. Une typologie des pratiques comptables liées aux incorporels à la période de transition aux normes IFRS est recherchée. Les résultats font ressortir trois classes d'entreprises affectées différemment par le passage aux normes internationales. La première classe est caractérisée par un changement important avec une forte augmentation du goodwill liée au retraitement d'immobilisations incorporelles comme les parts de marché. Les deuxième et troisième classes se caractérisent par une stabilité. Le phénomène d'inertie (Nobes, 2006) selon lequel les traitements comptables pré-IFRS pourraient perdurer sous normes IFRS est vérifié.Capital immatériel ; Analyse typologique ; DIV ; Goodwill ; Incorporels ; IFRS ; Transition

    Clustering of categorical variables around latent variables

    Get PDF
    In the framework of clustering, the usual aim is to cluster observations and not variables. However the issue of variable clustering clearly appears for dimension reduction, selection of variables or in some case studies (sensory analysis, biochemistry, marketing, etc.). Clustering of variables is then studied as a way to arrange variables into homogeneous clusters, thereby organizing data into meaningful structures. Once the variables are clustered into groups such that variables are similar to the other variables belonging to their cluster, the selection of a subset of variables is possible. Several specific methods have been developed for the clustering of numerical variables. However concerning categorical variables, much less methods have been proposed. In this paper we extend the criterion used by Vigneau and Qannari (2003) in their Clustering around Latent Variables approach for numerical variables to the case of categorical data. The homogeneity criterion of a cluster of categorical variables is defined as the sum of the correlation ratio between the categorical variables and a latent variable, which is in this case a numerical variable. We show that the latent variable maximizing the homogeneity of a cluster can be obtained with Multiple Correspondence Analysis. Different algorithms for the clustering of categorical variables are proposed: iterative relocation algorithm, ascendant and divisive hierarchical clustering. The proposed methodology is illustrated by a real data application to satisfaction of pleasure craft operators.clustering of categorical variables, correlation ratio, iterative relocation algorithm, hierarchical clustering
    • …
    corecore