127 research outputs found

    Another look at principal curves and surfaces

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Principal curves have been defined as smooth curves passing through the “middle” of a multidimensional data set. They are nonlinear generalizations of the first principal component, a characterization of which is the basis of the definition of principal curves. We establish a new characterization of the first principal component and base our new definition of a principal curve on this property. We introduce the notion of principal oriented points and we prove the existence of principal curves passing through these points. We extend the definition of principal curves to multivariate data sets and propose an algorithm to find them. The new notions lead us to generalize the definition of total variance. Successive principal curves are recursively defined from this generalization. The new methods are illustrated on simulated and real data sets.Peer ReviewedPostprint (author's final draft

    Measuring non-linear dependence for two random variables distributed along a curve

    Get PDF
    The final publication is available at link.springer.comWe propose new dependence measures for two real random variables not necessarily linearly related. Covariance and linear correlation are expressed in terms of principal components and are generalized for variables distributed along a curve. Properties of these measures are discussed. The new measures are estimated using principal curves and are computed for simulated and real data sets. Finally, we present several statistical applications for the new dependence measures.Peer ReviewedPostprint (author's final draft

    Analysing musical performance through functional data analysis: rhythmic structure in Schumann's Träumerei

    Get PDF
    Functional data analysis (FDA) is a relatively new branch of statistics devoted to describing and modelling data that are complete functions. Many relevant aspects of musical performance and perception can be understood and quantified as dynamic processes evolving as functions of time. In this paper, we show that FDA is a statistical methodology well suited for research into the field of quantitative musical performance analysis. To demonstrate this suitability, we consider tempo data for 28 performances of Schumann's Träumerei and analyse them by means of functional principal component analysis (one of the most powerful descriptive tools included in FDA). Specifically, we investigate the commonalities and differences between different performances regarding (expressive) timing, and we cluster similar performances together. We conclude that musical data considered as functional data reveal performance structures that might otherwise go unnoticed.Peer ReviewedPostprint (author's final draft

    Optimal level sets for representing a bivariate density function

    Get PDF
    We deal with the problem of representing a bivariate density function by level sets. The choice of which levels are used in this representation are commonly arbitrary (most usual choices being those with probability contents .25, .5 and .75). Choosing which level is (or which levels are) of most interest is an important practical question which depends on the kind of problem one has to deal with as well as the kind of feature one wishes to highlight in the density. The approach we develop is based on minimum distance ideas.Peer ReviewedPostprint (author's final draft

    Choosing the most relevant level sets for depicting a sample of densities

    Get PDF
    The final publication is available at link.springer.comWhen exploring a sample composed with a set of bivariate density functions, the question of the visualisation of the data has to front with the choice of the relevant level set(s). The approach proposed in this paper consists in defining the optimal level set(s) as being the one(s) allowing for the best reconstitution of the whole density. A fully data-driven procedure is developed in order to estimate the link between the level set(s) and their corresponding density, to construct optimal level set(s) and to choose automatically the number of relevant level set(s). The method is based on recent advances in functional data analysis when both response and predictors are functional. After a wide description of the methodology, finite sample studies are presented (including both real and simulated data) while theoretical studies are reported to a final appendix.Peer ReviewedPostprint (author's final draft

    Dimensionality reduction for samples of bivariate density level sets: an application to electoral results

    Get PDF
    The final publication is available at link.springer.comA bivariate densities can be represented as a density level set containing a fixed amount of probability (0.75, for instance). Then a functional dataset where the observations are bivariate density functions can be analyzed as if the functional data are density level sets.We compute distances between sets and perform standard Multidimensional Scaling. This methodology is applied to analyze electoral results.Peer ReviewedPostprint (author's final draft

    Identifying and classifying aberrant response patterns through functional data analysis

    Get PDF
    We propose new methods for identifying and classifying aberrant response patterns (ARPs) by means of functional data analysis. These methods take the person response function (PRF) of an individual and compare it with the pattern that would correspond to a generic individual of the same ability according to the item-person response surface. ARPs correspond to atypical difference functions. The ARP classification is done with functional data clustering applied to the PRFs identified as ARP. We apply these methods to two sets of simulated data (the first is used to illustrate the ARP identification methods and the second demonstrates classification of the response patterns flagged as ARP) and a real data set (a Grade 12 science assessment test, SAT, with 32 items answered by 600 examinees). For comparative purposes, ARPs are also identified with three nonparametric person-fit indices (Ht, Modified Caution Index, and ZU3). Our results indicate that the ARP detection ability of one of our proposed methods is comparable to that of person-fit indices. Moreover, the proposed classification methods enable ARP associated with either spuriously low or spuriously high scores to be distinguished.Peer ReviewedPostprint (author's final draft

    Optimal level sets for bivariate density representation

    Get PDF
    In bivariate density representation there is an extensive literature on level set estimation when the level is fixed, but this is not so much the case when choosing which level is (or which levels are) of most interest. This is an important practical question which depends on the kind of problem one has to deal with as well as the kind of feature one wishes to highlight in the density, the answer to which requires both the definition of what the optimal level is and the construction of a method for finding it. We consider two scenarios for this problem. The first one corresponds to situations in which one has just a single density function to be represented. However, as a result of the technical progress in data collecting, problems are emerging in which one has to deal with a sample of densities. In these situations, the need arises to develop joint representation for all these densities, and this is the second scenario considered in this paper. For each case, we provide consistency results for the estimated levels and present wide Monte Carlo simulated experiments illustrating the interest and feasibility of the proposed method. (C) 2015 Elsevier Inc. All rights reserved.Peer ReviewedPostprint (author's final draft

    Distance-based LISA maps for multivariate lattice data

    Get PDF
    In the context of areal data (a particular case of data with spatial dependence) we propose an algorithm to define spatial clusters. Our proposal is based on distance between the characteristics observed in different areas (individual). Thus it is able to be applied to any kind of observable characteristic on condition that an inter-individual distance can be defined. This way we provide a generalization of the well-known LISA maps that have been widely used for univariate data. We apply our proposals to the results of 2004 Spanish General Elections recorded at 248 neighborhoods in Barcelona

    The country factor on regional income distributions in Europe: A functional ANOVa approach

    Get PDF
    The distribution of regional Gini indices in Europe using the income distribution before taxes and transfers is not explained by the country to which the region belongs, i.e. the dispersion of the Ginis is not significantly reduced when we control for the country variable. On the contrary, there is a clear dependency between the regional Ginis and the country when the distribution of income before taxes and transfers is considered. This evidence is based on EUROMOD a multicountry tax-benefit model of the EU-15 (See Mercader and Levy 2004). We study to what extent this conclusion holds when we consider the complete income distributions instead of a summary inequality measure such as the Gini index. We use functional ANOVA (following Cuevas et al. 2004) in order to study the country explicative power on the dispersion of regional income density functions (estimated non-parametrically) before and after taxes and transfers. Our statistical evidence suggests that regional income distributions in different countries are different, both before and after redistribution takes place. However, the null assumption of equality of mean regional distributions among countries (a factor country equal to zero) is rejected more strongly in the after distribution case
    corecore