224,558 research outputs found

    Online Active Linear Regression via Thresholding

    Full text link
    We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing both the mean and variance of the squared error.Comment: Published in AAAI 201

    Robust and Efficient Regression

    Get PDF
    This dissertation aims to address two problems in regression analysis. One problem is the model selection and robust parameter estimation in high dimensional linear regressions. The other is concerning developing a robust and efficient estimator in nonparametric regressions. In Chapter 1, we introduce the robust and efficient regression analysis, discuss those two interesting problems and our motivations, and present several exciting results. We propose a novel robust penalized method for high dimensional linear regression in Chapter 2. Asymptotic properties are established and a data-driven procedure is developed to select adaptive penalties. We show it is the very first estimator to achieve desired oracle properties with certainty for high dimensional linear regression. Extensive simulations have been conducted and demonstrate the usefulness of the new technique. A new local polynomial nonparametric regression is developed in Chapter 3. It minimizes a convex combination of several weighted loss functions simultaneously. The optimal weights are selected by a proposed procedure and adapt to the tails of the error distribution resulting in a procedure which is both robust and resistant. The asymptotic properties have been investigated. We show the resulting estimators are at least as efficient as those provided by existing procedures, but can be much more efficient for many distributions. Its excellent finite sample performance is presented through simulations under a variety of settings. A real data analysis exhibits the usefulness of the proposed methodology

    Robust penalized regression for complex high-dimensional data

    Get PDF
    Robust high-dimensional data analysis has become an important and challenging task in complex Big Data analysis due to the high-dimensionality and data contamination. One of the most popular procedures is the robust penalized regression. In this dissertation, we address three typical robust ultra-high dimensional regression problems via penalized regression approaches. The first problem is related to the linear model with the existence of outliers, dealing with the outlier detection, variable selection and parameter estimation simultaneously. The second problem is related to robust high-dimensional mean regression with irregular settings such as the data contamination, data asymmetry and heteroscedasticity. The third problem is related to robust bi-level variable selection for the linear regression model with grouping structures in covariates. In Chapter 1, we introduce the background and challenges by overviews of penalized least squares methods and robust regression techniques. In Chapter 2, we propose a novel approach in a penalized weighted least squares framework to perform simultaneous variable selection and outlier detection. We provide a unified link between the proposed framework and a robust M-estimation in general settings. We also establish the non-asymptotic oracle inequalities for the joint estimation of both the regression coefficients and weight vectors. In Chapter 3, we establish a framework of robust estimators in high-dimensional regression models using Penalized Robust Approximated quadratic M estimation (PRAM). This framework allows general settings such as random errors lack of symmetry and homogeneity, or covariates are not sub-Gaussian. Theoretically, we show that, in the ultra-high dimension setting, the PRAM estimator has local estimation consistency at the minimax rate enjoyed by the LS-Lasso and owns the local oracle property, under certain mild conditions. In Chapter 4, we extend the study in Chapter 3 to robust high-dimensional data analysis with structured sparsity. In particular, we propose a framework of high-dimensional M-estimators for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. It produces strong robust parameter estimators if some nonconvex redescending loss functions are applied. In theory, we provide sufficient conditions under which our proposed two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency, if a certain nonconvex penalty function is used at the group level. The performances of the proposed estimators are demonstrated in both simulation studies and real examples. In Chapter 5, we provide some discussions and future work

    Robust and Sparse Regression via γ\gamma-divergence

    Full text link
    In high-dimensional data, many sparse regression methods have been proposed. However, they may not be robust against outliers. Recently, the use of density power weight has been studied for robust parameter estimation and the corresponding divergences have been discussed. One of such divergences is the γ\gamma-divergence and the robust estimator using the γ\gamma-divergence is known for having a strong robustness. In this paper, we consider the robust and sparse regression based on γ\gamma-divergence. We extend the γ\gamma-divergence to the regression problem and show that it has a strong robustness under heavy contamination even when outliers are heterogeneous. The loss function is constructed by an empirical estimate of the γ\gamma-divergence with sparse regularization and the parameter estimate is defined as the minimizer of the loss function. To obtain the robust and sparse estimate, we propose an efficient update algorithm which has a monotone decreasing property of the loss function. Particularly, we discuss a linear regression problem with L1L_1 regularization in detail. In numerical experiments and real data analyses, we see that the proposed method outperforms past robust and sparse methods.Comment: 25 page
    corecore