4 research outputs found
Set-valued Data: Regression, Design and Outliers
The focus of this dissertation is to study setâvalued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peerâreviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below:
1. Local regression smoothers with setâvalued outcome data:
This paper proposes a method to conduct local linear regression smoothing in the presence of setâvalued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genesâ expressions on different lung cancer treatments outcomes.
2. Optimal design for multivariate multiple linear regression with setâidentified response:
We consider the partially identified regression model with setâidentified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from setâidentified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their setâidentified analogues coincide with the optimal designs for pointâidentified realâvalued responses.
3. Depth and outliers for samples of sets and random sets distributions:
We suggest several constructions suitable to define the depth of setâvalued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of halfâspace depth and band depth for functionâvalued data. A novel construction derives the depth from a family of nonâlinear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles
Penalization of Barycenters in the Wasserstein Space
A regularization of Wasserstein barycenters for random measures supported on R^d is introduced via convex penalization. The existence and uniqueness of such barycenters is proved for a large class of penalization functions. A stability result of regularized barycenters in terms of Bregman distance associated to the penalization term is also given. This allows to compare the case of data made of n probability measures with the more realistic setting where we have only access to a dataset of random variables sampled from unknown distributions. We also analyze the convergence of the regularized empirical barycenter of a set of n iid random probability measures towards its population counterpart, and we discuss its rate of convergence. This approach is shown to be appropriate for the statistical analysis of discrete or absolutely continuous random measures. In this setting, we propose efficient algorithms for the computation of penalized Wasserstein barycenters. This approach is finally illustrated with simulated and real data sets.Generalized Optimal Transport Models for Image processin