411 research outputs found

    A Locally Stable Edit Distance for Merge Trees

    Full text link
    In this paper we define a novel edit distance for merge trees. Then we consider the metric space obtained and study the properties of such space obtaining completeness, compactness results and local approximations of such space by means of euclidean spaces. We also present results about its geodesic structure, with particular attention to objects called Frech\'et Means

    Wasserstein Principal Component Analysis for Circular Measures

    Full text link
    We consider the 2-Wasserstein space of probability measures supported on the unit-circle, and propose a framework for Principal Component Analysis (PCA) for data living in such a space. We build on a detailed investigation of the optimal transportation problem for measures on the unit-circle which might be of independent interest. In particular, we derive an expression for optimal transport maps in (almost) closed form and propose an alternative definition of the tangent space at an absolutely continuous probability measure, together with the associated exponential and logarithmic maps. PCA is performed by mapping data on the tangent space at the Wasserstein barycentre, which we approximate via an iterative scheme, and for which we establish a sufficient a posteriori condition to assess its convergence. Our methodology is illustrated on several simulated scenarios and a real data analysis of measurements of optical nerve thickness

    Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

    Full text link
    We present a novel class of projected methods, to perform statistical analysis on a data set of probability distributions on the real line, with the 2-Wasserstein metric. We focus in particular on Principal Component Analysis (PCA) and regression. To define these models, we exploit a representation of the Wasserstein space closely related to its weak Riemannian structure, by mapping the data to a suitable linear space and using a metric projection operator to constrain the results in the Wasserstein space. By carefully choosing the tangent point, we are able to derive fast empirical methods, exploiting a constrained B-spline approximation. As a byproduct of our approach, we are also able to derive faster routines for previous work on PCA for distributions. By means of simulation studies, we compare our approaches to previously proposed methods, showing that our projected PCA has similar performance for a fraction of the computational cost and that the projected regression is extremely flexible even under misspecification. Several theoretical properties of the models are investigated and asymptotic consistency is proven. Two real world applications to Covid-19 mortality in the US and wind speed forecasting are discussed

    Functional Data Representation with Merge Trees

    Full text link
    In this paper we face the problem of representation of functional data with the tools of algebraic topology. We represent functions by means of merge trees and this representation is compared with that offered by persistence diagrams. We show that these two tree structures, although not equivalent, are both invariant under homeomorphic re-parametrizations of the functions they represent, thus allowing for a statistical analysis which is indifferent to functional misalignment. We employ a novel metric for merge trees and we prove a few theoretical results related to its specific implementation when merge trees represent functions. To showcase the good properties of our topological approach to functional data analysis, we first go through a few examples using data generated {\em in silico} and employed to illustrate and compare the different representations provided by merge trees and persistence diagrams, and then we test it on the Aneurisk65 dataset replicating, from our different perspective, the supervised classification analysis which contributed to make this dataset a benchmark for methods dealing with misaligned functional data

    Being on time in magnetic reconnection

    Get PDF
    The role of magnetic reconnection on the evolution of the Kelvin-Helmholtz instability is investigated in a plasma configuration with a velocity shear field. It is shown that the rate at which the large-scale dynamics drives the formation of steep current sheets, leading to the onset of secondary magnetic reconnection instabilities, and the rate at which magnetic reconnection occurs compete in shaping the final state of the plasma configuration. These conclusions are reached within a two-fluid plasma description on the basis of a series of two-dimensional numerical simulations. Special attention is given to the role of the Hall term. In these simulations, the boundary conditions, the symmetry of the initial configuration and the simulation box size have been optimized in order not to affect the evolution of the system artificially

    Radiation Reaction Effects on Electron Nonlinear Dynamics and Ion Acceleration in Laser-solid Interaction

    Full text link
    Radiation Reaction (RR) effects in the interaction of an ultra-intense laser pulse with a thin plasma foil are investigated analytically and by two-dimensional (2D3P) Particle-In-Cell (PIC) simulations. It is found that the radiation reaction force leads to a significant electron cooling and to an increased spatial bunching of both electrons and ions. A fully relativistic kinetic equation including RR effects is discussed and it is shown that RR leads to a contraction of the available phase space volume. The results of our PIC simulations are in qualitative agreement with the predictions of the kinetic theory

    Data analysis with merge trees

    Get PDF
    Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram

    Kelvin-Helmholtz vortices and secondary instabilities in super-magnetosonic regimes

    Get PDF
    The nonlinear behaviour of the Kelvin-Helmholtz instability is investigated with a two-fluid simulation code in both sub-magnetosonic and super-magnetosonic regimes in a two-dimensional configuration chosen so as to represent typical conditions observed at the Earth's magnetopause flanks. It is shown that in super-magnetosonic regimes the plasma density inside the vortices produced by the development of the Kelvin-Helmholtz instability is approximately uniform, making the plasma inside the vortices effectively stable against the onset of secondary instabilities. However, the relative motion of the vortices relative to the plasma flow can cause the formation of shock structures. It is shown that in the region where the shocks are attached to the vortex boundaries the plasma conditions change rapidly and develop large gradients that allow for the onset of secondary instabilities not observed in sub-magnetosonic regimes

    Spatially dependent mixture models via the Logistic Multivariate CAR prior

    Full text link
    We consider the problem of spatially dependent areal data, where for each area independent observations are available, and propose to model the density of each area through a finite mixture of Gaussian distributions. The spatial dependence is introduced via a novel joint distribution for a collection of vectors in the simplex, that we term logisticMCAR. We show that salient features of the logisticMCAR distribution can be described analytically, and that a suitable augmentation scheme based on the P\'olya-Gamma identity allows to derive an efficient Markov Chain Monte Carlo algorithm. When compared to competitors, our model has proved to better estimate densities in different (disconnected) areal locations when they have different characteristics. We discuss an application on a real dataset of Airbnb listings in the city of Amsterdam, also showing how to easily incorporate for additional covariate information in the model

    Imaging-based representation and stratification of intra-tumor heterogeneity via tree-edit distance

    Get PDF
    Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning
    • …
    corecore