18 research outputs found

    Archetypoid analysis for sports analytics

    Get PDF
    We intend to understand the growing amount of sports performance data by finding extreme data points, which makes human interpretation easier. In archetypoid analysis each datum is expressed as a mixture of actual observations (archetypoids). Therefore, it allows us to identify not only extreme athletes and teams, but also the composition of other athletes (or teams) according to the archetypoid athletes, and to establish a ranking. The utility of archetypoids in sports is illustrated with basketball and soccer data in three scenarios. Firstly, with multivariate data, where they are compared with other alternatives, showing their best results. Secondly, despite the fact that functional data are common in sports (time series or trajectories), functional data analysis has not been exploited until now, due to the sparseness of functions. In the second scenario, we extend archetypoid analysis for sparse functional data, furthermore showing the potential of functional data analysis in sports analytics. Finally, in the third scenario, features are not available, so we use proximities. We extend archetypoid analysis when asymmetric relations are present in data. This study provides information that will provide valuable knowledge about player/team/league performance so that we can analyze athlete’s careers.This work has been partially supported by Grant DPI2013-47279-C2-1-R. The databases and R code (including the web application) to reproduce the results can be freely accessed at www.uv.es/vivigui/software

    Análisis de arquetipos de las respuestas del estudiantado a las encuestas docentes

    Get PDF
    Comunicació presentada al 2º Congreso Virtual Avances en Tecnologías, Innovación y Desafíos de la Educación Superior ATIDES 2018 (15-31 de octubre de 2018, En línea)Una forma habitual de valorar la docencia del profesorado es, en parte, a través de las encuestas a los estudiantes. Los datos en bruto, no resumidos, ofrecen la posibilidad de ser examinados. En este trabajo se ilustrará el uso del análisis de arquetipos con datos faltantes (no todos los estudiantes responden a todas las preguntas), una técnica estadística que nos permitirá obtener una instantánea de cómo han respondido los estudiantes a dicha encuesta ese año y asignatura, y tener una radiografía más clara de sus opiniones. También se mostrará qué factores han influido más en la satisfacción general con el profesorado, mediante el uso de bosques aleatorios. En concreto, se analizarán los datos de dos casos que muestran dos situaciones diferentes. Esta metodología puede emplearse en otros problemas de minería de datos en Educación.A common way of assessing teaching ability is, in part, through student surveys. The raw data, not summarized, offer the possibility of being examined. This paper will illustrate the use of archetype analysis with missing data (not all students answer all questions), a statistical technique that will allow us to obtain a snapshot of how students have responded to that survey that year and subject, and have a more detailed analysis of their opinions. It will also show which factors have most influenced the overall satisfaction with the teaching staff, through the use of random forests. In particular, the data of two cases that show two different situations will be analyzed. This methodology can be used in other data mining problems in Education

    Archetypal analysis for ordinal data

    Get PDF
    Archetypoid analysis (ADA) is an exploratory approach that explains a set of continuous observations as mixtures of pure (extreme) patterns. Those patterns (archetypoids) are actual observations of the sample which makes the results of this technique easily interpretable, even for non-experts. Note that the observations are approximated as a convex combination of the archetypoids. Archetypoid analysis, in its current form, cannot be applied directly to ordinal data. We propose and describe a two-step method for applying ADA to ordinal responses based on the ordered stereotype model. One of the main advantages of this model is that it allows us to convert the ordinal data to numerical values, using a new data-driven spacing that better reflects the ordinal patterns of the data, and this numerical conversion then enables us to apply ADA straightforwardly. The results of the novel method are presented for two behavioural science applications. Finally, the proposed method is also compared with other unsupervised statistical learning methods

    Classifying top economists using archetypoid analysis

    Get PDF
    Updating the study by Seiler and Wohlrabe (2013) we use archetypoid analysis to classify top economists. The approach allows us to identify typical characteristics of extreme (archetypal) values in a multivariate data set. In contrast to its predecessor, the archetypal analysis, archetypoids always represent actual observed units in the data. Using bibliometric data from 776 top economists we identify four archetypoids. These types represent solid, low, top and diligent performer. Each economist is assigned to one or more of these archetypoids

    Archetypal Curves in the Shape and Size Space: Discovering the Salient Features of Curved Big Data by Representative Extremes

    Get PDF
    Curves are complex data. Tools for visualizing, exploring, and discovering the structure of a data set of curves are valuable. In this paper, we propose a scalable methodology to solve this challenge. On the one hand, we consider two distances in the shape and size space, one well-known distance and another recently proposed, which differentiate the contribution in shape and in size of the elements considered to compute the distance. On the other hand, we use archetypoid analysis (ADA) for the first time in elastic shape analysis. ADA is a recent technique in unsupervised statistical learning, whose objective is to find a set of archetypal observations (curves in this case), in such a way that we can describe the data set as convex combinations of these archetypal curves. This makes interpretation easy, even for non-experts. Archetypal curves or pure types are extreme cases, which also facilitates human understanding. The methodology is illustrated with a simulated data set and applied to a real problem. It is important to know the distribution of foot shapes to design suitable footwear that accommodates the population. For this purpose, we apply our proposed methodology to a real data set composed of foot contours from the adult Spanish population.Funding for open access charge: CRUE-Universitat Jaume

    Biarchetype analysis: simultaneous learning of observations and features based on extremes

    Full text link
    A new exploratory technique called biarchetype analysis is defined. We extend archetype analysis to find the archetypes of both observations and features simultaneously. The idea of this new unsupervised machine learning tool is to represent observations and features by instances of pure types (biarchetypes) that can be easily interpreted as they are mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which also helps understand the structure of the data. We propose an algorithm to solve biarchetype analysis. We show that biarchetype analysis offers advantages over biclustering, especially in terms of interpretability. This is because byarchetypes are extreme instances as opposed to the centroids returned by biclustering, which favors human understanding. Biarchetype analysis is applied to several machine learning problems to illustrate its usefulness

    A data-driven classification of 3D foot types by archetypal shapes based on landmarks

    Get PDF
    The taxonomy of foot shapes or other parts of the body is important, especially for design purposes. We propose a methodology based on archetypoid analysis (ADA) that overcomes the weaknesses of previous methodologies used to establish typologies. ADA is an objective, data-driven methodology that seeks extreme patterns, the archetypal profiles in the data. ADA also explains the data as percentages of the archetypal patterns, which makes this technique understandable and accessible even for non-experts. Clustering techniques are usually considered for establishing taxonomies, but we will show that finding the purest or most extreme patterns is more appropriate than using the central points returned by clustering techniques. We apply the methodology to an anthropometric database of 775 3D right foot scans representing the Spanish adult female and male population for footwear design. Each foot is described by a 5626 × 3 configuration matrix of landmarks. No multivariate features are used for establishing the taxonomy, but all the information gathered from the 3D scanning is employed. We use ADA for shapes described by landmarks. Women’s and men’s feet are analyzed separately. We have analyzed 3 archetypal feet for both men and women. These archetypal feet could not have been recovered using multivariate techniques

    Robust multivariate and functional archetypal analysis with application to financial time series analysis

    Get PDF
    The code and data for reproducing the examples are available at http://www3.uji.es/epifanio/RESEARCH/rofada.rar. A preliminary version of this work was presented at the 8th International Conference on Mathematical and Statistical Methods for Actuarial Sciences and Finance (MAF 2018) (Moliner and Epifanio (2018)), where the application data were analyzed in a non-robust way.Archetypal analysis approximates data by means of mixtures of actual extreme cases (archetypoids) or archetypes, which are a convex combination of cases in the data set. Archetypes lie on the boundary of the convex hull. This makes the analysis very sensitive to outliers. A robust methodology by means of M-estimators for classical multivariate and functional data is proposed. This unsupervised methodology allows complex data to be understood even by non-experts. The performance of the new procedure is assessed in a simulation study, where a comparison with a previous methodology for the multivariate case is also carried out, and our proposal obtains favorable results. Finally, robust bivariate functional archetypoid analysis is applied to a set of companies in the S&P 500 described by two time series of stock quotes. A new graphic representation is also proposed to visualize the results. The analysis shows how the information can be easily interpreted and how even non-experts can gain a qualitative understanding of the data

    Archetype analysis: A new subspace outlier detection approach

    Get PDF
    The problem of detecting outliers in multivariate data sets with continuous numerical features is addressed by a new method. This method combines projections into relevant subspaces by archetype analysis with a nearest neighbor algorithm, through an appropriate ensemble of the results. Our method is able to detect an anomaly in a simple data set with a linear correlation of two features, while other methods fail to recognize that anomaly. Our method performs among top in an extensive comparison with 23 state-of-the-art outlier detection algorithms with several benchmark data sets. Finally, a novel industrial data set is introduced, and an outlier analysis is carried out to improve the fit of footwear, since this kind of analysis has never been fully exploited in the anthropometric field.Funding for open access charge: CRUE-Universitat Jaume

    Archetypal analysis with missing data: see all samples by looking at a few based on extreme profiles

    Get PDF
    In this paper we propose several methodologies for handling missing or incomplete data in Archetype analysis (AA) and Archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, i.e. they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified in order to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real data sets: a well-known climate data set and a global development data set. We illustrate how these unsupervised methodologies allow complex data to be understood, even by non-experts
    corecore