264 research outputs found

    Statistical distances for model validation and clustering. Applications to flow cytometry and fair learning.

    Get PDF
    This thesis has been developed at the University of Valladolid and IMUVA within the framework of the project Sampling, trimming, and probabilistic metric techniques. Statis- tical applications whose main researchers are Carlos Matr an Bea and Eustasio del Barrio Tellado. Among the lines of research associated with the project are: model validation, Wasserstein distances and robust cluster analysis. It is precisely the work carried out in these elds that gives rise to chapters 1,2 and 4 of this report. The work done in the eld of fair learning with Professor Jean-Michel Loubes, frequent collaborator with Valladolid's team, during the international stay at the Paul Sabatier University of Toulouse, is the basis of Chapter 3 of this report. Therefore, this thesis is an exposition of the problems and results obtained in the di erent elds previously mentioned. Due to the diversity of topics, we have decided to base chapters on the works published or submitted to the present date, and therefore each chapter has a structure relatively independent of the others. In this way Chapter 1 is based on the works [del Barrio et al., 2019e,del Barrio et al., 2019d], Chapter 2 is based on the work [del Barrio et al., 2019c], Chapter 3 on the work [del Barrio et al., 2019b] and Chapter 4 shows results of a work in progress. In this introduction our objective is to present the main challenges we have faced, as well as to brie y present our most relevant results. On the other hand, each chapter will have its own introduction where we will delve into the topics discussed below. With this in mind, our intention is that the reader will have a general idea of what he or she will nd in each chapter and in this way will have the necessary information to face the more technical discussions that will be found there. Due to the diversity of topics dealt with in this report, we propose a non-linear reading. We suggest that the reader, after reading a section of the Introduction, moves to the corresponding chapter. In this way the reader will have the relevant information more at hand and will be able to follow better the exposition in each chapter. If on the other hand there is a sequential reading of the document, we apologize in advance for some repetitions and reiterations, which nevertheless seem to us to contribute positively to the understanding of this work.Departamento de EstadĂ­stica e InvestigaciĂłn OperativaDoctorado en MatemĂĄtica

    The triangulation of manifolds

    Full text link
    A mostly expository account of old questions about the relationship between polyhedra and topological manifolds. Topics are old topological results, new gauge theory results (with speculations about next directions), and history of the questions.Comment: 26 pages, 2 figures. version 2: spellings corrected, analytic speculations in 4.8.2 sharpene

    Drawing Binary Tanglegrams: An Experimental Evaluation

    Full text link
    A binary tanglegram is a pair of binary trees whose leaf sets are in one-to-one correspondence; matching leaves are connected by inter-tree edges. For applications, for example in phylogenetics or software engineering, it is required that the individual trees are drawn crossing-free. A natural optimization problem, denoted tanglegram layout problem, is thus to minimize the number of crossings between inter-tree edges. The tanglegram layout problem is NP-hard and is currently considered both in application domains and theory. In this paper we present an experimental comparison of a recursive algorithm of Buchin et al., our variant of their algorithm, the algorithm hierarchy sort of Holten and van Wijk, and an integer quadratic program that yields optimal solutions.Comment: see http://www.siam.org/proceedings/alenex/2009/alx09_011_nollenburgm.pd

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Times series averaging from a probabilistic interpretation of time-elastic kernel

    Get PDF
    At the light of regularized dynamic time warping kernels, this paper reconsider the concept of time elastic centroid (TEC) for a set of time series. From this perspective, we show first how TEC can easily be addressed as a preimage problem. Unfortunately this preimage problem is ill-posed, may suffer from over-fitting especially for long time series and getting a sub-optimal solution involves heavy computational costs. We then derive two new algorithms based on a probabilistic interpretation of kernel alignment matrices that expresses in terms of probabilistic distributions over sets of alignment paths. The first algorithm is an iterative agglomerative heuristics inspired from the state of the art DTW barycenter averaging (DBA) algorithm proposed specifically for the Dynamic Time Warping measure. The second proposed algorithm achieves a classical averaging of the aligned samples but also implements an averaging of the time of occurrences of the aligned samples. It exploits a straightforward progressive agglomerative heuristics. An experimentation that compares for 45 time series datasets classification error rates obtained by first near neighbors classifiers exploiting a single medoid or centroid estimate to represent each categories show that: i) centroids based approaches significantly outperform medoids based approaches, ii) on the considered experience, the two proposed algorithms outperform the state of the art DBA algorithm, and iii) the second proposed algorithm that implements an averaging jointly in the sample space and along the time axes emerges as the most significantly robust time elastic averaging heuristic with an interesting noise reduction capability. Index Terms-Time series averaging Time elastic kernel Dynamic Time Warping Time series clustering and classification

    Asymptotics of Some Plancherel Averages via Polynomiality Results

    Full text link
    Consider Young diagrams of nn boxes distributed according to the Plancherel measure. So those diagrams could be the output of the RSK algorithm, when applied to random permutations of the set {1,
,n}\{1,\ldots,n\}. Here we are interested in asymptotics, as n→∞n\to \infty, of expectations of certain functions of random Young diagrams, such as the number of bumping steps of the RSK algorithm that leads to that diagram, the side length of its Durfee square, or the logarithm of its probability. We can express these functions in terms of hook lengths or contents of the boxes of the diagram, which opens the door for application of known polynomiality results for Plancherel averages. We thus obtain representations of expectations as binomial convolutions, that can be further analyzed with the help of Rice's integral or Poisson generating functions. Among our results is a very explicit expression for the constant appearing in the almost equipartition property of the Plancherel measure

    Spectral inequalities in quantitative form

    Full text link
    We review some results about quantitative improvements of sharp inequalities for eigenvalues of the Laplacian.Comment: 71 pages, 4 figures, 6 open problems, 76 references. This is a chapter of the forthcoming book "Shape Optimization and Spectral Theory", edited by Antoine Henrot and published by De Gruyte

    The Bounded Confidence Model Of Opinion Dynamics

    Get PDF
    The bounded confidence model of opinion dynamics, introduced by Deffuant et al, is a stochastic model for the evolution of continuous-valued opinions within a finite group of peers. We prove that, as time goes to infinity, the opinions evolve globally into a random set of clusters too far apart to interact, and thereafter all opinions in every cluster converge to their barycenter. We then prove a mean-field limit result, propagation of chaos: as the number of peers goes to infinity in adequately started systems and time is rescaled accordingly, the opinion processes converge to i.i.d. nonlinear Markov (or McKean-Vlasov) processes; the limit opinion processes evolves as if under the influence of opinions drawn from its own instantaneous law, which are the unique solution of a nonlinear integro-differential equation of Kac type. This implies that the (random) empirical distribution processes converges to this (deterministic) solution. We then prove that, as time goes to infinity, this solution converges to a law concentrated on isolated opinions too far apart to interact, and identify sufficient conditions for the limit not to depend on the initial condition, and to be concentrated at a single opinion. Finally, we prove that if the equation has an initial condition with a density, then its solution has a density at all times, develop a numerical scheme for the corresponding functional equation, and show numerically that bifurcations may occur.Comment: 43 pages, 7 figure
    • 

    corecore