4 research outputs found
Computing Wasserstein Barycenter via operator splitting: the method of averaged marginals
The Wasserstein barycenter (WB) is an important tool for summarizing sets of
probabilities. It finds applications in applied probability, clustering, image
processing, etc. When the probability supports are finite and fixed, the
problem of computing a WB is formulated as a linear optimization problem whose
dimensions generally exceed standard solvers' capabilities. For this reason,
the WB problem is often replaced with a simpler nonlinear optimization model
constructed via an entropic regularization function so that specialized
algorithms can be employed to compute an approximate WB efficiently. Contrary
to such a widespread inexact scheme, we propose an exact approach based on the
Douglas-Rachford splitting method applied directly to the WB linear
optimization problem for applications requiring accurate WB. Our algorithm,
which has the interesting interpretation of being built upon averaging
marginals, operates series of simple (and exact) projections that can be
parallelized and even randomized, making it suitable for large-scale datasets.
As a result, our method achieves good performance in terms of speed while still
attaining accuracy. Furthermore, the same algorithm can be applied to compute
generalized barycenters of sets of measures with different total masses by
allowing for mass creation and destruction upon setting an additional
parameter. Our contribution to the field lies in the development of an exact
and efficient algorithm for computing barycenters, enabling its wider use in
practical applications. The approach's mathematical properties are examined,
and the method is benchmarked against the state-of-the-art methods on several
data sets from the literature
Analyzing covariate clustering effects in healthcare cost subgroups: insights and applications for prediction
Healthcare cost prediction is a challenging task due to the
high-dimensionality and high correlation among covariates. Additionally, the
skewed, heavy-tailed, and often multi-modal nature of cost data can complicate
matters further due to unobserved heterogeneity. In this study, we propose a
novel framework for finite mixture regression models that incorporates
covariate clustering methods to better account for the effects of clustered
covariates on subgroups of the outcome, which enables a more accurate
characterization of the complex distribution of the data. The proposed
framework can be formulated as a convex optimization problem with an additional
penalty term based on the prior similarity of the covariates. To efficiently
solve this optimization problem, a specialized EM-ADMM algorithm is proposed
that integrates the alternating direction multiplicative method (ADMM) into the
iterative process of the expectation-maximizing (EM) algorithm. The convergence
of the algorithm and the efficiency of the covariate clustering method are
verified using simulation data, and the superiority of the approach over
traditional regression techniques is demonstrated using two real Chinese
medical expenditure datasets. Our empirical results provide valuable insights
into the complex network graph of the covariates and can inform business
practices, such as the design and pricing of medical insurance products.Comment: 36 pages; 7 figure
Décompositions fonctionnelles et structurelles dans les modèles graphiques probabilistes appliquées à la reconstruction d'haplotypes
Cette thèse s'articule autour de deux thèmes : la décomposition dans les modèles graphiques que sont, entre autres, les réseaux bayésiens et les réseaux de fonctions de coûts (WCSP) et la reconstruction d'haplotypes dans les pedigrees. Nous appliquons les techniques des WCSP pour traiter les réseaux bayésiens, en exploitant les propriétés structurelles et fonctionnelles, de manière exacte et approchée, des instances dans le cadre de l'inférence (ou d'un problème proche, celui de compter le nombre de solutions) et de l'optimisation. Nous définissons en particulier une décomposition de fonctions qui produit des fonctions portant sur un plus petit nombre de variables. Un exemple d'application en optimisation est la reconstruction d'haplotypes. Elle est essentielle pour une meilleure prédiction de la gravité de maladie ou pour comprendre des caractères physiques particuliers. La reconstruction d'haplotypes se modélise sous forme d'un réseau bayésien.
La décomposition fonctionnelle permet de réduire ce réseau bayésien en un problème d'optimisation WCSP (Max-2SAT).This thesis is based on two topics : the decomposition in graphical models which are, among others, Bayesian networks
and cost function networks (WCSP) and the haplotype reconstruction in pedigrees. We apply techniques of WCSP to treat Bayesian network. We exploit stuctural and fonctional properties, in an exact and approached methods. Particulary, we define a decomposition of function which produces functions with a smaller variable number. An application example in optimization is the haplotype reconstruction. It is essential for a best prediction of seriousness of disease or to understand particular physical characters. Haplotype reconstruction is represented with a Bayesian network. The functionnal decomposition allows to reduce this Bayesian network in an optimization problem WCSP (Max-2SAT)