71 research outputs found

    Classification of geometrical objects by integrating currents and functional data analysis. An application to a 3D database of Spanish child population

    Full text link
    This paper focuses on the application of Discriminant Analysis to a set of geometrical objects (bodies) characterized by currents. A current is a relevant mathematical object to model geometrical data, like hypersurfaces, through integration of vector fields along them. As a consequence of the choice of a vector-valued Reproducing Kernel Hilbert Space (RKHS) as a test space to integrate hypersurfaces, it is possible to consider that hypersurfaces are embedded in this Hilbert space. This embedding enables us to consider classification algorithms of geometrical objects. A method to apply Functional Discriminant Analysis in the obtained vector-valued RKHS is given. This method is based on the eigenfunction decomposition of the kernel. So, the novelty of this paper is the reformulation of a size and shape classification problem in Functional Data Analysis terms using the theory of currents and vector-valued RKHS. This approach is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children's wear

    Generalized Linear Models for Geometrical Current predictors. An application to predict garment fit

    Get PDF
    The aim of this paper is to model an ordinal response variable in terms of vector-valued functional data included on a vector-valued RKHS. In particular, we focus on the vector-valued RKHS obtained when a geometrical object (body) is characterized by a current and on the ordinal regression model. A common way to solve this problem in functional data analysis is to express the data in the orthonormal basis given by decomposition of the covariance operator. But our data present very important differences with respect to the usual functional data setting. On the one hand, they are vector-valued functions, and on the other, they are functions in an RKHS with a previously defined norm. We propose to use three different bases: the orthonormal basis given by the kernel that defines the RKHS, a basis obtained from decomposition of the integral operator defined using the covariance function, and a third basis that combines the previous two. The three approaches are compared and applied to an interesting problem: building a model to predict the fit of children’s garment sizes, based on a 3D database of the Spanish child population. Our proposal has been compared with alternative methods that explore the performance of other classifiers (Suppport Vector Machine and k-NN), and with the result of applying the classification method proposed in this work, from different characterizations of the objects (landmarks and multivariate anthropometric measurements instead of currents), obtaining in all these cases worst results

    Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles

    Get PDF
    In this paper we propose several methodologies for handling missing or incomplete data in Archetype analysis (AA) and Archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, i.e. they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified in order to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real data sets: a well-known climate data set and a global development data set. We illustrate how these unsupervised methodologies allow complex data to be understood, even by non-experts

    Archetypal shapes based on landmarks and extension to handle missing data

    Get PDF
    Archetype and archetypoid analysis are extended to shapes. The objective is to find representative shapes. Archetypal shapes are pure (extreme) shapes. We focus on the case where the shape of an object is represented by a configuration matrix of landmarks. As shape space is not a vectorial space, we work in the tangent space, the linearized space about the mean shape. Then, each observation is approximated by a convex combination of actual observations (archetypoids) or archetypes, which are a convex combination of observations in the data set. These tools can contribute to the understanding of shapes, as in the usual multivariate case, since they lie somewhere between clustering and matrix factorization methods. A new simplex visualization tool is also proposed to provide a picture of the archetypal analysis results. We also propose new algorithms for performing archetypal analysis with missing data and its extension to incomplete shapes. A well-known data set is used to illustrate the methodologies developed. The proposed methodology is applied to an apparel design problem in children

    A review of spatiotemporal models for count data in R packages. A case study of COVID-19 data

    Get PDF
    Spatiotemporal models for count data are required in a wide range of scientific fields, and they have become particularly crucial today because of their ability to analyze COVID-19- related data. The main objective of this paper is to present a review describing the most important approaches, and we monitor their performance under the same dataset. For this review, we focus on the three R-packages that can be used for this purpose, and the different models assessed are representative of the two most widespread methodologies used to analyze spatiotemporal count data: the classical approach and the Bayesian point of view. A COVID-19-related case study is analyzed as an illustration of these different methodologies. Because of the current urgent need for monitoring and predicting data in the COVID-19 pandemic, this case study is, in itself, of particular importance and can be considered the secondary objective of this work. Satisfactory and promising results have been obtained in this second goal. With respect to the main objective, it has been seen that, although the three models provide similar results in our case study, their different properties and flexibility allow us to choose the model depending on the application at hand

    Curvature approximation from parabolic sectors

    Get PDF
    We propose an invariant three-point curvature approximation for plane curves based on the arc of a parabolic sector, and we analyze how closely this approximation is to the true curvature of the curve. We compare our results with the obtained with other invariant three-point curvature approximations. Finally, an application is discussed

    Factors determining waste generation in Spanish towns and cities

    Get PDF
    This paper analyzes the generation and composition of municipal solid waste in Spanish towns and cities with more than 5000 inhabitants, which altogether account for 87 % of the Spanish population. To do so, the total composition and generation of municipal solid waste fractions were obtained from 135 towns and cities. Homogeneity tests revealed heterogeneity in the proportions of municipal solid waste fractions from one city to another. Statistical analyses identified significant differences in the generation of glass in cities of different sizes and in the generation of all fractions depending on the hydrographic area. Finally, linear regression models and residuals analysis were applied to analyze the effect of different demographic, geographic, and socioeconomic variables on the generation of waste fractions. The conclusions show that more densely populated towns, a hydrographic area, and cities with over 50,000 inhabitants have higher waste generation rates, while certain socioeconomic variables (people/car) decrease that generation. Other socioeconomic variables (foreigners and unemployment) show a positive and null influence on that waste generation, respectively

    Non-homogeneous temporal Boolean models to study endocytosis

    Get PDF
    Many medical and biological problems require the analysis of large sequences of microscope images, these images capture phenomena of interest and it is essential to characterize their spatial and temporal properties. The purpose of this paper is to show a new statistical methodology for estimating these parameters of interest in image sequences obtained in the observation of endocytosis. Endocytosis is a process by which cells traffic molecules from the extracellular space into different intracellular compartments. These images are obtained using a very specialized microscopy technique called Total Internal Reflecting (TIRFM). The Homogeneous Temporal Boolean Model (HTBM) has been recently used to analyze these type of sequences of images. By using a HTBM, spatial homogeneity of events in the cell membrane must be assumed but this is an open question in the biological understanding of the endocytic process. Our aim in this paper is to generalize this methodology to overcome this drawback. In the methodological aspect this work has a threefold aim: to broaden the notion of HTBM by introducing the concept of Non-Homogeneous Temporal Boolean Model; to introduce a hypothesis testing procedure to check the spatial homogeneity assumption; and finally, to reformulate the existing methodology to work with underlying non-homogeneous point processes. We check the goodness of our methodology on a simulated data set and compare our results with those provided by visual inspection and by assuming spatial homogeneity. The accuracy of the results obtained with simulated data ensures the validity of our methodology. Finally we apply it, as an illustration, to three sequences of a particular type of endocytosis images. The spatial homogeneity test confirms that spatial homogeneity cannot be assumed. As a result, our methodology provides more accurate estimations for the duration of the events and, information about areas of the membrane with higher accumulation of the
    corecore