118,093 research outputs found

    Multivariate trend comparisons between autocorrelated climate series with general trend regressors

    Get PDF
    Inference regarding trends in climatic data series, including comparisons across different data sets as well as univariate trend significance tests, is complicated by the presence of serial correlation and step-changes in the mean. We review recent developments in the estimation of heteroskedasticity and autocorrelation robust (HAC) covariance estimators as they have been applied to linear trend inference, with focus on the Vogelsang-Franses (2005) nonparametric approach, which provides a unified framework for trend covariance estimation robust to unknown forms of autocorrelation up to but not including unit roots, making it especially useful for climatic data applications. We extend the Vogelsang-Franses approach to allow general deterministic regressors including the case where a step-change in the mean occurs at a known date. Additional regressors change the critical values of the Vogelsang-Franses statistic. We derive an asymptotic approximation that can be used to simulate critical values. We also outline a simple bootstrap procedure that generates valid critical values and p-values. The motivation for extending the Vogelsang-Franses approach is an application that compares climate model generated and observational global temperature data in the tropical lower- and mid-troposphere from 1958 to 2010. Inclusion of a mean shift regressor to capture the Pacific Climate Shift of 1977 causes apparently significant observed trends to become statistically insignificant, and rejection of the equivalence between model generated and observed data trends occurs for much smaller significance levels (i.e. is more strongly rejected).Autocorrelation; trend estimation; HAC variance matrix; global warming; model comparisons

    Robust penalized regression for complex high-dimensional data

    Get PDF
    Robust high-dimensional data analysis has become an important and challenging task in complex Big Data analysis due to the high-dimensionality and data contamination. One of the most popular procedures is the robust penalized regression. In this dissertation, we address three typical robust ultra-high dimensional regression problems via penalized regression approaches. The first problem is related to the linear model with the existence of outliers, dealing with the outlier detection, variable selection and parameter estimation simultaneously. The second problem is related to robust high-dimensional mean regression with irregular settings such as the data contamination, data asymmetry and heteroscedasticity. The third problem is related to robust bi-level variable selection for the linear regression model with grouping structures in covariates. In Chapter 1, we introduce the background and challenges by overviews of penalized least squares methods and robust regression techniques. In Chapter 2, we propose a novel approach in a penalized weighted least squares framework to perform simultaneous variable selection and outlier detection. We provide a unified link between the proposed framework and a robust M-estimation in general settings. We also establish the non-asymptotic oracle inequalities for the joint estimation of both the regression coefficients and weight vectors. In Chapter 3, we establish a framework of robust estimators in high-dimensional regression models using Penalized Robust Approximated quadratic M estimation (PRAM). This framework allows general settings such as random errors lack of symmetry and homogeneity, or covariates are not sub-Gaussian. Theoretically, we show that, in the ultra-high dimension setting, the PRAM estimator has local estimation consistency at the minimax rate enjoyed by the LS-Lasso and owns the local oracle property, under certain mild conditions. In Chapter 4, we extend the study in Chapter 3 to robust high-dimensional data analysis with structured sparsity. In particular, we propose a framework of high-dimensional M-estimators for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. It produces strong robust parameter estimators if some nonconvex redescending loss functions are applied. In theory, we provide sufficient conditions under which our proposed two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency, if a certain nonconvex penalty function is used at the group level. The performances of the proposed estimators are demonstrated in both simulation studies and real examples. In Chapter 5, we provide some discussions and future work

    Pretest estimation in combining probability and non-probability samples

    Full text link
    Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to general parameter estimation by combining gold-standard probability and non-probability samples. We focus on the case when the study variable is observed in both datasets for estimating the target parameters, and each contains other auxiliary variables. Utilizing the probability design, we conduct a pretest procedure to determine the comparability of the non-probability data with the probability data and decide whether or not to leverage the non-probability data in a pooled analysis. When the probability and non-probability data are comparable, our approach combines both data for efficient estimation. Otherwise, we retain only the probability data for estimation. We also characterize the asymptotic distribution of the proposed test-and-pool estimator under a local alternative and provide a data-adaptive procedure to select the critical tuning parameters that target the smallest mean square error of the test-and-pool estimator. Lastly, to deal with the non-regularity of the test-and-pool estimator, we construct a robust confidence interval that has a good finite-sample coverage property.Comment: Accepted in Electronic Journal of Statistic

    Multivariate trend comparisons between autocorrelated climate series with general trend regressors

    Get PDF
    Abstract Inference regarding trends in climatic data series, including comparisons across different data sets as well as univariate trend significance tests, is complicated by the presence of serial correlation and step-changes in the mean. We review recent developments in the estimation of heteroskedasticity and autocorrelation robust (HAC) covariance estimators as they have been applied to linear trend inference, with focus on the Vogelsang-Franses (2005) nonparametric approach, which provides a unified framework for trend covariance estimation robust to unknown forms of autocorrelation up to but not including unit roots, making it especially useful for climatic data applications. We extend the Vogelsang-Franses approach to allow general deterministic regressors including the case where a step-change in the mean occurs at a known date. Additional regressors change the critical values of the Vogelsang-Franses statistic. We derive an asymptotic approximation that can be used to simulate critical values. We also outline a simple bootstrap procedure that generates valid critical values and p-values. The motivation for extending the Vogelsang-Franses approach is an application that compares climate model generated and observational global temperature data in the tropical lowerand mid-troposphere from 1958 to 2010. Inclusion of a mean shift regressor to capture the Pacific Climate Shift of 1977 causes apparently significant observed trends to become statistically insignificant, and rejection of the equivalenc
    • …
    corecore