25 research outputs found

    A kernel regression procedure in the 3D shape space with an application to online sales of children's wear

    Get PDF
    Shape regression is of key importance in many scienti c elds. In this paper, we focus on the case where the shape of an object is represented by a con- guration matrix of landmarks. It is well known that this shape space has a nite-dimensional Riemannian manifold structure (non-Euclidean) which makes it di cult to work with. Papers about regression on this space are scarce in the literature. The majority of them are restricted to the case of a single explanatory variable, usually time or age, and many of them work in the approximated tangent space. In this paper we adapt the general method for kernel regression analysis in manifold-valued data proposed by Davis et al (2007) to the three-dimensional case of Kendall's shape space and generalize it to multiple explanatory variables. We also propose bootstrap con dence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and nally it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children's wear

    Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles

    Get PDF
    In this paper we propose several methodologies for handling missing or incomplete data in Archetype analysis (AA) and Archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, i.e. they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified in order to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real data sets: a well-known climate data set and a global development data set. We illustrate how these unsupervised methodologies allow complex data to be understood, even by non-experts

    Geometric analysis of planar shapes with applications to cell deformations

    Get PDF
    Shape analysis is of great importance in many fields such as computer vision, medical imaging, and computational biology. In this paper we focus on a shape space in which shapes are represented by means of planar closed curves. In this shape space a new metric was recently introduced with the result that this shape space has the property of being isometric to an infinite-dimensional Grassmann manifold of 2-dimensional subspaces. Using this isometry it is possible, from Younes et al. (2008), to explicitly describe geodesics, a task that previously was not at all easy. Our aim is twofold, namely: to use this general theory in order to show some applications to the study of erythrocytes, using digital images of peripheral blood smears, in the treatment of sickle cell disease; and, since normal erythrocytes are almost circular and many Sickle cells have elliptical shape, to particularize the computation of geodesics and distances between shapes using this metric to planar objects considered as deformations of a template (circle or ellipse). The applications considered include: shape interpolation, shape classification, and shape clustering

    Archetypal shapes based on landmarks and extension to handle missing data

    Get PDF
    Archetype and archetypoid analysis are extended to shapes. The objective is to find representative shapes. Archetypal shapes are pure (extreme) shapes. We focus on the case where the shape of an object is represented by a configuration matrix of landmarks. As shape space is not a vectorial space, we work in the tangent space, the linearized space about the mean shape. Then, each observation is approximated by a convex combination of actual observations (archetypoids) or archetypes, which are a convex combination of observations in the data set. These tools can contribute to the understanding of shapes, as in the usual multivariate case, since they lie somewhere between clustering and matrix factorization methods. A new simplex visualization tool is also proposed to provide a picture of the archetypal analysis results. We also propose new algorithms for performing archetypal analysis with missing data and its extension to incomplete shapes. A well-known data set is used to illustrate the methodologies developed. The proposed methodology is applied to an apparel design problem in children

    Generalized Linear Models for Geometrical Current predictors. An application to predict garment fit

    Get PDF
    The aim of this paper is to model an ordinal response variable in terms of vector-valued functional data included on a vector-valued RKHS. In particular, we focus on the vector-valued RKHS obtained when a geometrical object (body) is characterized by a current and on the ordinal regression model. A common way to solve this problem in functional data analysis is to express the data in the orthonormal basis given by decomposition of the covariance operator. But our data present very important differences with respect to the usual functional data setting. On the one hand, they are vector-valued functions, and on the other, they are functions in an RKHS with a previously defined norm. We propose to use three different bases: the orthonormal basis given by the kernel that defines the RKHS, a basis obtained from decomposition of the integral operator defined using the covariance function, and a third basis that combines the previous two. The three approaches are compared and applied to an interesting problem: building a model to predict the fit of children’s garment sizes, based on a 3D database of the Spanish child population. Our proposal has been compared with alternative methods that explore the performance of other classifiers (Suppport Vector Machine and k-NN), and with the result of applying the classification method proposed in this work, from different characterizations of the objects (landmarks and multivariate anthropometric measurements instead of currents), obtaining in all these cases worst results

    Parameter estimation in non-homogeneous boolean models: an application to plant defense response

    Get PDF
    Many medical and biological problems require to extract information from microscopical images. Boolean models have been extensively used to analyze binary images of random clumps in many scientific fields. In this paper, a particular type of Boolean model with an underlying non-stationary point process is considered. The intensity of the underlying point process is formulated as a fixed function of the distance to a region of interest. A method to estimate the parameters of this Boolean model is introduced, and its performance is checked in two different settings. Firstly, a comparative study with other existent methods is done using simulated data. Secondly, the method is applied to analyze the longleaf data set, which is a very popular data set in the context of point processes included in the R package spatstat. Obtained results show that the new method provides as accurate estimates as those obtained with more complex methods developed for the general case. Finally, to illustrate the application of this model and this method, a particular type of phytopathological images are analyzed. These images show callose depositions in leaves of Arabidopsis plants. The analysis of callose depositions, is very popular in the phytopathological literature to quantify activity of plant immunity

    Non-homogeneous temporal Boolean models to study endocytosis

    Get PDF
    Many medical and biological problems require the analysis of large sequences of microscope images, these images capture phenomena of interest and it is essential to characterize their spatial and temporal properties. The purpose of this paper is to show a new statistical methodology for estimating these parameters of interest in image sequences obtained in the observation of endocytosis. Endocytosis is a process by which cells traffic molecules from the extracellular space into different intracellular compartments. These images are obtained using a very specialized microscopy technique called Total Internal Reflecting (TIRFM). The Homogeneous Temporal Boolean Model (HTBM) has been recently used to analyze these type of sequences of images. By using a HTBM, spatial homogeneity of events in the cell membrane must be assumed but this is an open question in the biological understanding of the endocytic process. Our aim in this paper is to generalize this methodology to overcome this drawback. In the methodological aspect this work has a threefold aim: to broaden the notion of HTBM by introducing the concept of Non-Homogeneous Temporal Boolean Model; to introduce a hypothesis testing procedure to check the spatial homogeneity assumption; and finally, to reformulate the existing methodology to work with underlying non-homogeneous point processes. We check the goodness of our methodology on a simulated data set and compare our results with those provided by visual inspection and by assuming spatial homogeneity. The accuracy of the results obtained with simulated data ensures the validity of our methodology. Finally we apply it, as an illustration, to three sequences of a particular type of endocytosis images. The spatial homogeneity test confirms that spatial homogeneity cannot be assumed. As a result, our methodology provides more accurate estimations for the duration of the events and, information about areas of the membrane with higher accumulation of the

    Estadística. Volum 1

    Get PDF
    Diplomatura en Ciències Empresarials. C23: Estadístic

    Unsupervised classification of children’s bodies using currents

    Get PDF
    Object classification according to their shape and size is of key importance in many scientific fields. This work focuses on the case where the size and shape of an object is characterized by a current. A current is a mathematical object which has been proved relevant to the modeling of geometrical data, like submanifolds, through integration of vector fields along them. As a consequence of the choice of a vector-valued reproducing kernel Hilbert space (RKHS) as a test space for integrating manifolds, it is possible to consider that shapes are embedded in this Hilbert Space. A vector-valued RKHS is a Hilbert space of vector fields; therefore, it is possible to compute a mean of shapes, or to calculate a distance between two manifolds. This embedding enables us to consider size-and-shape clustering algorithms. These algorithms are applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear

    A data-driven classification of 3D foot types by archetypal shapes based on landmarks

    Get PDF
    The taxonomy of foot shapes or other parts of the body is important, especially for design purposes. We propose a methodology based on archetypoid analysis (ADA) that overcomes the weaknesses of previous methodologies used to establish typologies. ADA is an objective, data-driven methodology that seeks extreme patterns, the archetypal profiles in the data. ADA also explains the data as percentages of the archetypal patterns, which makes this technique understandable and accessible even for non-experts. Clustering techniques are usually considered for establishing taxonomies, but we will show that finding the purest or most extreme patterns is more appropriate than using the central points returned by clustering techniques. We apply the methodology to an anthropometric database of 775 3D right foot scans representing the Spanish adult female and male population for footwear design. Each foot is described by a 5626 × 3 configuration matrix of landmarks. No multivariate features are used for establishing the taxonomy, but all the information gathered from the 3D scanning is employed. We use ADA for shapes described by landmarks. Women’s and men’s feet are analyzed separately. We have analyzed 3 archetypal feet for both men and women. These archetypal feet could not have been recovered using multivariate techniques
    corecore