163 research outputs found

    Archetypes for histogram-valued data

    Get PDF
    Il principale sviluppo innovativo del lavoro è quello di propone una estensione dell'analisi archetipale per dati ad istogramma. Per quanto concerne l'impianto metodologico nell'approccio all'analisi di dati ad istogramma, che sono di natura complessa, il presente lavora utilizza le intuizioni della "Symbolic Data Analysis" (SDA) e le relazioni intrinseche tra dati valutati ad intervallo e dati valutati ad istogramma. Dopo aver discusso la tecnica sviluppata in ambiente Matlab, il suo funzionamento e le sue proprietà su di un esempio di comodo, tale tecnica viene proposta, nella sezione applicativa, come strumento per effettuare una analisi di tipo "benchmarking" quantitativo. Nello specifico, si propongono i principali risultati ottenuti da una applicazione degli archetipi per dati ad istogramma ad un caso di benchmarking interno del sistema scolastico, utilizzando dati provenienti dal test INVALSI relativi all'anno scolastico 2015/2016. In questo contesto l'unità di analisi è considerata essere la singola scuola, definita operativamente attraverso le distribuzioni dei punteggi dei propri alunni valutate, congiuntamente, sotto forma di oggetti simbolici ad istogramma

    3rd Workshop in Symbolic Data Analysis: book of abstracts

    Get PDF
    This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

    Clustering measure-valued data with Wasserstein barycenters

    Full text link
    In this work, learning schemes for measure-valued data are proposed, i.e. data that their structure can be more efficiently represented as probability measures instead of points on Rd\R^d, employing the concept of probability barycenters as defined with respect to the Wasserstein metric. Such type of learning approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed statistical learning schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The discussed approaches are implemented in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories which is a standard task in remote sensing. In both case studies the results are particularly interesting and meaningful while the accuracy obtained is high.Comment: 18 pages, 3 figure

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Morphologie, Géométrie et Statistiques en imagerie non-standard

    Get PDF
    Digital image processing has followed the evolution of electronic and computer science. It is now current to deal with images valued not in {0,1} or in gray-scale, but in manifolds or probability distributions. This is for instance the case for color images or in diffusion tensor imaging (DTI). Each kind of images has its own algebraic, topological and geometric properties. Thus, existing image processing techniques have to be adapted when applied to new imaging modalities. When dealing with new kind of value spaces, former operators can rarely be used as they are. Even if the underlying notion has still a meaning, a work must be carried out in order to express it in the new context.The thesis is composed of two independent parts. The first one, "Mathematical morphology on non-standard images", concerns the extension of mathematical morphology to specific cases where the value space of the image does not have a canonical order structure. Chapter 2 formalizes and demonstrates the irregularity issue of total orders in metric spaces. The main results states that for any total order in a multidimensional vector space, there are images for which the morphological dilations and erosions are irregular and inconsistent. Chapter 3 is an attempt to generalize morphology to images valued in a set of unordered labels.The second part "Probability density estimation on Riemannian spaces" concerns the adaptation of standard density estimation techniques to specific Riemannian manifolds. Chapter 5 is a work on color image histograms under perceptual metrics. The main idea of this chapter consists in computing histograms using local Euclidean approximations of the perceptual metric, and not a global Euclidean approximation as in standard perceptual color spaces. Chapter 6 addresses the problem of non parametric density estimation when data lay in spaces of Gaussian laws. Different techniques are studied, an expression of kernels is provided for the Wasserstein metric.Le traitement d'images numériques a suivi l'évolution de l'électronique et de l'informatique. Il est maintenant courant de manipuler des images à valeur non pas dans {0,1}, mais dans des variétés ou des distributions de probabilités. C'est le cas par exemple des images couleurs où de l'imagerie du tenseur de diffusion (DTI). Chaque type d'image possède ses propres structures algébriques, topologiques et géométriques. Ainsi, les techniques existantes de traitement d'image doivent être adaptés lorsqu'elles sont appliquées à de nouvelles modalités d'imagerie. Lorsque l'on manipule de nouveaux types d'espaces de valeurs, les précédents opérateurs peuvent rarement être utilisés tel quel. Même si les notions sous-jacentes ont encore un sens, un travail doit être mené afin de les exprimer dans le nouveau contexte. Cette thèse est composée de deux parties indépendantes. La première, « Morphologie mathématiques pour les images non standards », concerne l'extension de la morphologie mathématique à des cas particuliers où l'espace des valeurs de l'image ne possède pas de structure d'ordre canonique. Le chapitre 2 formalise et démontre le problème de l'irrégularité des ordres totaux dans les espaces métriques. Le résultat principal de ce chapitre montre qu'étant donné un ordre total dans un espace vectoriel multidimensionnel, il existe toujours des images à valeur dans cet espace tel que les dilatations et les érosions morphologiques soient irrégulières et incohérentes. Le chapitre 3 est une tentative d'extension de la morphologie mathématique aux images à valeur dans un ensemble de labels non ordonnés.La deuxième partie de la thèse, « Estimation de densités de probabilités dans les espaces de Riemann » concerne l'adaptation des techniques classiques d'estimation de densités non paramétriques à certaines variétés Riemanniennes. Le chapitre 5 est un travail sur les histogrammes d'images couleurs dans le cadre de métriques perceptuelles. L'idée principale de ce chapitre consiste à calculer les histogrammes suivant une approximation euclidienne local de la métrique perceptuelle, et non une approximation globale comme dans les espaces perceptuels standards. Le chapitre 6 est une étude sur l'estimation de densité lorsque les données sont des lois Gaussiennes. Différentes techniques y sont analysées. Le résultat principal est l'expression de noyaux pour la métrique de Wasserstein
    • …
    corecore