264 research outputs found
Statistical distances for model validation and clustering. Applications to flow cytometry and fair learning.
This thesis has been developed at the University of Valladolid and IMUVA within the
framework of the project Sampling, trimming, and probabilistic metric techniques. Statis-
tical applications whose main researchers are Carlos Matr an Bea and Eustasio del Barrio
Tellado. Among the lines of research associated with the project are: model validation,
Wasserstein distances and robust cluster analysis. It is precisely the work carried out in
these elds that gives rise to chapters 1,2 and 4 of this report.
The work done in the eld of fair learning with Professor Jean-Michel Loubes, frequent
collaborator with Valladolid's team, during the international stay at the Paul Sabatier
University of Toulouse, is the basis of Chapter 3 of this report.
Therefore, this thesis is an exposition of the problems and results obtained in the
di erent elds previously mentioned. Due to the diversity of topics, we have decided to
base chapters on the works published or submitted to the present date, and therefore
each chapter has a structure relatively independent of the others. In this way Chapter 1
is based on the works [del Barrio et al., 2019e,del Barrio et al., 2019d], Chapter 2 is based
on the work [del Barrio et al., 2019c], Chapter 3 on the work [del Barrio et al., 2019b]
and Chapter 4 shows results of a work in progress.
In this introduction our objective is to present the main challenges we have faced, as
well as to brie
y present our most relevant results. On the other hand, each chapter will
have its own introduction where we will delve into the topics discussed below. With this
in mind, our intention is that the reader will have a general idea of what he or she will
nd in each chapter and in this way will have the necessary information to face the more
technical discussions that will be found there.
Due to the diversity of topics dealt with in this report, we propose a non-linear reading.
We suggest that the reader, after reading a section of the Introduction, moves to the
corresponding chapter. In this way the reader will have the relevant information more
at hand and will be able to follow better the exposition in each chapter. If on the other
hand there is a sequential reading of the document, we apologize in advance for some
repetitions and reiterations, which nevertheless seem to us to contribute positively to the
understanding of this work.Departamento de EstadĂstica e InvestigaciĂłn OperativaDoctorado en MatemĂĄtica
The triangulation of manifolds
A mostly expository account of old questions about the relationship between
polyhedra and topological manifolds. Topics are old topological results, new
gauge theory results (with speculations about next directions), and history of
the questions.Comment: 26 pages, 2 figures. version 2: spellings corrected, analytic
speculations in 4.8.2 sharpene
Drawing Binary Tanglegrams: An Experimental Evaluation
A binary tanglegram is a pair of binary trees whose leaf sets are in
one-to-one correspondence; matching leaves are connected by inter-tree edges.
For applications, for example in phylogenetics or software engineering, it is
required that the individual trees are drawn crossing-free. A natural
optimization problem, denoted tanglegram layout problem, is thus to minimize
the number of crossings between inter-tree edges.
The tanglegram layout problem is NP-hard and is currently considered both in
application domains and theory. In this paper we present an experimental
comparison of a recursive algorithm of Buchin et al., our variant of their
algorithm, the algorithm hierarchy sort of Holten and van Wijk, and an integer
quadratic program that yields optimal solutions.Comment: see
http://www.siam.org/proceedings/alenex/2009/alx09_011_nollenburgm.pd
Euclidean distance geometry and applications
Euclidean distance geometry is the study of Euclidean geometry based on the
concept of distance. This is useful in several applications where the input
data consists of an incomplete set of distances, and the output is a set of
points in Euclidean space that realizes the given distances. We survey some of
the theory of Euclidean distance geometry and some of the most important
applications: molecular conformation, localization of sensor networks and
statics.Comment: 64 pages, 21 figure
Times series averaging from a probabilistic interpretation of time-elastic kernel
At the light of regularized dynamic time warping kernels, this paper
reconsider the concept of time elastic centroid (TEC) for a set of time series.
From this perspective, we show first how TEC can easily be addressed as a
preimage problem. Unfortunately this preimage problem is ill-posed, may suffer
from over-fitting especially for long time series and getting a sub-optimal
solution involves heavy computational costs. We then derive two new algorithms
based on a probabilistic interpretation of kernel alignment matrices that
expresses in terms of probabilistic distributions over sets of alignment paths.
The first algorithm is an iterative agglomerative heuristics inspired from the
state of the art DTW barycenter averaging (DBA) algorithm proposed specifically
for the Dynamic Time Warping measure. The second proposed algorithm achieves a
classical averaging of the aligned samples but also implements an averaging of
the time of occurrences of the aligned samples. It exploits a straightforward
progressive agglomerative heuristics. An experimentation that compares for 45
time series datasets classification error rates obtained by first near
neighbors classifiers exploiting a single medoid or centroid estimate to
represent each categories show that: i) centroids based approaches
significantly outperform medoids based approaches, ii) on the considered
experience, the two proposed algorithms outperform the state of the art DBA
algorithm, and iii) the second proposed algorithm that implements an averaging
jointly in the sample space and along the time axes emerges as the most
significantly robust time elastic averaging heuristic with an interesting noise
reduction capability. Index Terms-Time series averaging Time elastic kernel
Dynamic Time Warping Time series clustering and classification
Asymptotics of Some Plancherel Averages via Polynomiality Results
Consider Young diagrams of boxes distributed according to the Plancherel
measure. So those diagrams could be the output of the RSK algorithm, when
applied to random permutations of the set . Here we are
interested in asymptotics, as , of expectations of certain
functions of random Young diagrams, such as the number of bumping steps of the
RSK algorithm that leads to that diagram, the side length of its Durfee square,
or the logarithm of its probability. We can express these functions in terms of
hook lengths or contents of the boxes of the diagram, which opens the door for
application of known polynomiality results for Plancherel averages. We thus
obtain representations of expectations as binomial convolutions, that can be
further analyzed with the help of Rice's integral or Poisson generating
functions. Among our results is a very explicit expression for the constant
appearing in the almost equipartition property of the Plancherel measure
Spectral inequalities in quantitative form
We review some results about quantitative improvements of sharp inequalities
for eigenvalues of the Laplacian.Comment: 71 pages, 4 figures, 6 open problems, 76 references. This is a
chapter of the forthcoming book "Shape Optimization and Spectral Theory",
edited by Antoine Henrot and published by De Gruyte
The Bounded Confidence Model Of Opinion Dynamics
The bounded confidence model of opinion dynamics, introduced by Deffuant et
al, is a stochastic model for the evolution of continuous-valued opinions
within a finite group of peers. We prove that, as time goes to infinity, the
opinions evolve globally into a random set of clusters too far apart to
interact, and thereafter all opinions in every cluster converge to their
barycenter. We then prove a mean-field limit result, propagation of chaos: as
the number of peers goes to infinity in adequately started systems and time is
rescaled accordingly, the opinion processes converge to i.i.d. nonlinear Markov
(or McKean-Vlasov) processes; the limit opinion processes evolves as if under
the influence of opinions drawn from its own instantaneous law, which are the
unique solution of a nonlinear integro-differential equation of Kac type. This
implies that the (random) empirical distribution processes converges to this
(deterministic) solution. We then prove that, as time goes to infinity, this
solution converges to a law concentrated on isolated opinions too far apart to
interact, and identify sufficient conditions for the limit not to depend on the
initial condition, and to be concentrated at a single opinion. Finally, we
prove that if the equation has an initial condition with a density, then its
solution has a density at all times, develop a numerical scheme for the
corresponding functional equation, and show numerically that bifurcations may
occur.Comment: 43 pages, 7 figure
- âŠ