4 research outputs found

    Set-valued Data: Regression, Design and Outliers

    Get PDF
    The focus of this dissertation is to study set‐valued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peer‐reviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below: 1. Local regression smoothers with set‐valued outcome data: This paper proposes a method to conduct local linear regression smoothing in the presence of set‐valued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genes’ expressions on different lung cancer treatments outcomes. 2. Optimal design for multivariate multiple linear regression with set‐identified response: We consider the partially identified regression model with set‐identified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from set‐identified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their set‐identified analogues coincide with the optimal designs for point‐identified real‐valued responses. 3. Depth and outliers for samples of sets and random sets distributions: We suggest several constructions suitable to define the depth of set‐valued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of half‐space depth and band depth for function‐valued data. A novel construction derives the depth from a family of non‐linear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles

    Penalization of Barycenters in the Wasserstein Space

    No full text
    A regularization of Wasserstein barycenters for random measures supported on R^d is introduced via convex penalization. The existence and uniqueness of such barycenters is proved for a large class of penalization functions. A stability result of regularized barycenters in terms of Bregman distance associated to the penalization term is also given. This allows to compare the case of data made of n probability measures with the more realistic setting where we have only access to a dataset of random variables sampled from unknown distributions. We also analyze the convergence of the regularized empirical barycenter of a set of n iid random probability measures towards its population counterpart, and we discuss its rate of convergence. This approach is shown to be appropriate for the statistical analysis of discrete or absolutely continuous random measures. In this setting, we propose efficient algorithms for the computation of penalized Wasserstein barycenters. This approach is finally illustrated with simulated and real data sets.Generalized Optimal Transport Models for Image processin
    corecore