5 research outputs found

    Robust regularized Kernel regression

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier

    Kernel Methods for Classification with Irregularly Sampled and Contaminated Data.

    Full text link
    Design of a classifier consists of two stages: feature extraction and classifier learning. For a better performance, the nature, characteristics, or underlying structure of data should be taken into account in either of the stages when we design a classifier. In this thesis, we present kernel methods for classification with irregularly sampled and contaminated data. First, we propose a feature extraction method for irregularly sampled data. Irregularly sampled data often arises in medical applications where the vital signs of patients are monitored based on the severity of their condition and the availability of nursing staff. In particular, we consider an ICU (intensive care unit) admission prediction problem for a post-operative patient with possible sepsis. The experimental results show that the proposed features, when paired with kernel methods, have more discriminating power than those used by clinicians. Second, we consider one-class classification problem with contaminated data, where the majority of the data comes from a "nominal" distribution with a small fraction of the data coming from an outlying distribution. We deal with this problem by robustly estimating the nominal density (or a level set thereof) from the contaminated data. Our proposed density estimation achieves robustness by combinining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. The robustness of the density estimator is demonstrated with a representer theorem, the influence function, and experimental results. Third, we propose a kernel classifier that optimizes the L_2 distances between "difference of densities". Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We also provide statistical performance guarantees for the proposed L_2 kernel classifier in the form of a finite sample oracle inequality, and strong consistency in the sense of both ISE and probability of error.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89858/1/stannum_1.pd

    High dimensional data analysis for anomaly detection and quality improvement

    Get PDF
    Analysis of large-scale high-dimensional data with a complex heterogeneous data structure to extract information or useful features is vital for the purpose of data fusion for assessment of system performance, early detection of system anomalies, intelligent sampling and sensing for data collection and decision making to achieve optimal system performance. Chapter 3 focuses on detecting anomalies from high-dimensional data. Traditionally, most of the image-based anomaly detection methods perform denoising and detection sequentially, which affects detection accuracy and efficiency. In this chapter, A novel methodology, named smooth-sparse decomposition (SSD), is proposed to exploit regularized high-dimensional regression to decompose an image and separate anomalous regions simultaneously by solving a large-scale optimization problem. Chapter 4 extends this to spatial-temporal functional data by extending SSD to spatiotemporal smooth-sparse decomposition (ST-SSD), with a likelihood ratio test to detect the time of change accurately based on the detected anomaly. To enable real-time implementation of the proposed methodology, recursive estimation procedures for ST-SSD are also developed. The proposed methodology is also applied to tonnage signals, rolling inspection data and solar flare monitoring. Chapter 5 considers the adaptive sampling problem for high-dimensional data. A novel adaptive sampling framework, named Adaptive Kernelized Maximum-Minimum Distance is proposed to adaptively estimate the sparse anomalous region. The proposed method balances the sampling efforts between the space filling sampling (exploration) and focused sampling near the anomalous region (exploitation). The proposed methodology is also applied to a case study of anomaly detection in composite sheets using a guided wave test. Chapter 6 explores the penalized tensor regression to model the tensor response data with the process variables. Regularized Tucker decomposition and regularized tensor regression methods are developed, which model the structured point cloud data as tensors and link the point cloud data with the process variables. The performance of the proposed method is evaluated through simulation and a real case study of turning process optimization.Ph.D
    corecore