22 research outputs found

    Robustness properties of estimators in generalized Pareto Models

    Get PDF
    We study global and local robustness properties of several estimators for shape and scale in a generalized Pareto model. The estimators considered in this paper cover maximum likelihood estimators, skipped maximum likelihood estimators, moment-based estimators, Cramér-von-Mises Minimum Distance estimators, and, as a special case of quantile-based estimators, Pickands Estimator as well as variants of the latter tuned for higher finite sample breakdown point (FSBP), and lower variance. We further consider an estimator matching population median and median of absolute deviations to the empirical ones (MedMad); again, in order to improve its FSBP, we propose a variant using a suitable asymmetric Mad as constituent, and which may be tuned to achieve an expected FSBP of 34%. These estimators are compared to one-step estimators distinguished as optimal in the shrinking neighborhood setting, i.e., the most bias-robust estimator minimizing the maximal (asymptotic) bias and the estimator minimizing the maximal (asymptotic) MSE. For each of these estimators, we determine the FSBP, the influence function, as well as statistical accuracy measured by asymptotic bias, variance, and mean squared error—all evaluated uniformly on shrinking convex contamination neighborhoods. Finally, we check these asymptotic theoretical findings against finite sample behavior by an extensive simulation study

    Approaches for Outlier Detection in Sparse High-Dimensional Regression Models

    Get PDF
    Modern regression studies often encompass a very large number of potential predictors, possibly larger than the sample size, and sometimes growing with the sample size itself. This increases the chances that a substantial portion of the predictors is redundant, as well as the risk of data contamination. Tackling these problems is of utmost importance to facilitate scientific discoveries, since model estimates are highly sensitive both to the choice of predictors and to the presence of outliers. In this thesis, we contribute to this area considering the problem of robust model selection in a variety of settings, where outliers may arise both in the response and the predictors. Our proposals simplify model interpretation, guarantee predictive performance, and allow us to study and control the influence of outlying cases on the fit. First, we consider the co-occurrence of multiple mean-shift and variance-inflation outliers in low-dimensional linear models. We rely on robust estimation techniques to identify outliers of each type, exclude mean-shift outliers, and use restricted maximum likelihood estimation to down-weight and accommodate variance-inflation outliers into the model fit. Second, we extend our setting to high-dimensional linear models. We show that mean-shift and variance-inflation outliers can be modeled as additional fixed and random components, respectively, and evaluated independently. Specifically, we perform feature selection and mean-shift outlier detection through a robust class of nonconcave penalization methods, and variance-inflation outlier detection through the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination – which allows the number of features to exponentially increase with the sample size – and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Third, focusing on high-dimensional linear models affected by meanshift outliers, we develop a general framework in which L0-constraints coupled with mixed-integer programming techniques are used to perform simultaneous feature selection and outlier detection with provably optimal guarantees. In particular, we provide necessary and sufficient conditions for a robustly strong oracle property, where again the number of features can increase exponentially with the sample size, and prove optimality for parameter estimation and the resulting breakdown point. Finally, we consider generalized linear models and rely on logistic slippage to perform outlier detection and removal in binary classification. Here we use L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem of feature selection and outlier detection, and the framework allows us again to pursue optimality guarantees. For all the proposed approaches, we also provide computationally lean heuristic algorithms, tuning procedures, and diagnostic tools which help to guide the analysis. We consider several real-world applications, including the study of the relationships between childhood obesity and the human microbiome, and of the main drivers of honey bee loss. All methods developed and data used, as well as the source code to replicate our analyses, are publicly available

    Seinale prozesaketan eta ikasketa automatikoan oinarritutako ekarpenak bihotz-erritmoen analisirako bihotz-biriketako berpiztean

    Get PDF
    Tesis inglés 218 p. -- Tesis euskera 220 p.Out-of-hospital cardiac arrest (OHCA ) is characterized by the sudden loss of the cardiac function, andcauses around 10% of the total mortality in developed countries. Survival from OHCA depends largelyon two factors: early defibrillation and early cardiopulmonary resuscitation (CPR). The electrical shock isdelivered using a shock advice algorithm (SAA) implemented in defibrillators. Unfortunately, CPR mustbe stopped for a reliable SAA analysis because chest compressions introduce artefacts in the ECG. Theseinterruptions in CPR have an adverse effect on OHCA survival. Since the early 1990s, many efforts havebeen made to reliably analyze the rhythm during CPR. Strategies have mainly focused on adaptive filtersto suppress the CPR artefact followed by SAAs of commercial defibrillators. However, these solutionsdid not meet the American Heart Association¿s (AHA) accuracy requirements for shock/no-shockdecisions. A recent approach, which replaces the commercial SAA by machine learning classifiers, hasdemonstrated that a reliable rhythm analysis during CPR is possible. However, defibrillation is not theonly treatment needed during OHCA, and depending on the clinical context a finer rhythm classificationis needed. Indeed, an optimal OHCA scenario would allow the classification of the five cardiac arrestrhythm types that may be present during resuscitation. Unfortunately, multiclass classifiers that allow areliable rhythm analysis during CPR have not yet been demonstrated. On all of these studies artefactsoriginate from manual compressions delivered by rescuers. Mechanical compression devices, such as theLUCAS or the AutoPulse, are increasingly used in resuscitation. Thus, a reliable rhythm analysis duringmechanical CPR is becoming critical. Unfortunately, no AHA compliant algorithms have yet beendemonstrated during mechanical CPR. The focus of this thesis work is to provide new or improvedsolutions for rhythm analysis during CPR, including shock/no-shock decision during manual andmechanical CPR and multiclass classification during manual CPR

    Robust and Regularized Algorithms for Vehicle Tractive Force Prediction and Mass Estimation

    Get PDF
    This work provides novel robust and regularized algorithms for parameter estimation with applications in vehicle tractive force prediction and mass estimation. Given a large record of real world data from test runs on public roads, recursive algorithms adjusted the unknown vehicle parameters under a broad variation of statistical assumptions for two linear gray-box models

    Population based spatio-temporal probabilistic modelling of fMRI data

    Get PDF
    High-dimensional functional magnetic resonance imaging (fMRI) data is characterized by complex spatial and temporal patterns related to neural activation. Mixture based Bayesian spatio-temporal modelling is able to extract spatiotemporal components representing distinct haemodyamic response and activation patterns. A recent development of such approach to fMRI data analysis is so-called spatially regularized mixture model of hidden process models (SMM-HPM). SMM-HPM can be used to reduce the four-dimensional fMRI data of a pre-determined region of interest (ROI) to a small number of spatio-temporal prototypes, sufficiently representing the spatio-temporal features of the underlying neural activation. Summary statistics derived from these features can be interpreted as quantification of (1) the spatial extent of sub-ROI activation patterns, (2) how fast the brain respond to external stimuli; and (3) the heterogeneity in single ROIs. This thesis aims to extend the single-subject SMM-HPM to a multi-subject SMM-HPM so that such features can be extracted at group-level, which would enable more robust conclusion to be drawn

    Computer Vision Approaches to Liquid-Phase Transmission Electron Microscopy

    Get PDF
    Electron microscopy (EM) is a technique that exploits the interaction between electron and matter to produce high resolution images down to atomic level. In order to avoid undesired scattering in the electron path, EM samples are conventionally imaged in solid state under vacuum conditions. Recently, this limit has been overcome by the realization of liquid-phase electron microscopy (LP EM), a technique that enables the analysis of samples in their liquid native state. LP EM paired with a high frame rate acquisition direct detection camera allows tracking the motion of particles in liquids, as well as their temporal dynamic processes. In this research work, LP EM is adopted to image the dynamics of particles undergoing Brownian motion, exploiting their natural rotation to access all the particle views, in order to reconstruct their 3D structure via tomographic techniques. However, specific computer vision-based tools were designed around the limitations of LP EM in order to elaborate the results of the imaging process. Consequently, different deblurring and denoising approaches were adopted to improve the quality of the images. Therefore, the processed LP EM images were adopted to reconstruct the 3D model of the imaged samples. This task was performed by developing two different methods: Brownian tomography (BT) and Brownian particle analysis (BPA). The former tracks in time a single particle, capturing its dynamics evolution over time. The latter is an extension in time of the single particle analysis (SPA) technique. Conventionally it is paired to cryo-EM to reconstruct 3D density maps starting from thousands of EM images by capturing hundreds of particles of the same species frozen on a grid. On the contrary, BPA has the ability to process image sequences that may not contain thousands of particles, but instead monitors individual particle views across consecutive frames, rather than across a single frame

    Selected Papers from the 9th World Congress on Industrial Process Tomography

    Get PDF
    Industrial process tomography (IPT) is becoming an important tool for Industry 4.0. It consists of multidimensional sensor technologies and methods that aim to provide unparalleled internal information on industrial processes used in many sectors. This book showcases a selection of papers at the forefront of the latest developments in such technologies

    Robust and Regularized Algorithms for Vehicle Tractive Force Prediction and Mass Estimation

    Get PDF
    This dissertation provides novel robust and regularized algorithms from linear system identification for parameter estimation with applications in vehicle tractive force prediction and mass estimation

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems
    corecore