2,056 research outputs found

    Penalized single-index quantile regression

    Get PDF
    This article is made available through the Brunel Open Access Publishing Fund. Copyright for this article is retained by the author(s), with first publication rights granted to the journal. This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).The single-index (SI) regression and single-index quantile (SIQ) estimation methods product linear combinations of all the original predictors. However, it is possible that there are many unimportant predictors within the original predictors. Thus, the precision of parameter estimation as well as the accuracy of prediction will be effected by the existence of those unimportant predictors when the previous methods are used. In this article, an extension of the SIQ method of Wu et al. (2010) has been proposed, which considers Lasso and Adaptive Lasso for estimation and variable selection. Computational algorithms have been developed in order to calculate the penalized SIQ estimates. A simulation study and a real data application have been used to assess the performance of the methods under consideration

    Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection

    Full text link
    In high-dimensional model selection problems, penalized simple least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1L_1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1L_1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1L_1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples

    Mathematical Statistics of Partially Identified Objects

    Get PDF
    The workshop brought together leading experts in mathematical statistics, theoretical econometrics and bio-mathematics interested in mathematical objects occurring in the analysis of partially identified structures. The mathematical core of these ubiquitous structures has an impact on all three research areas and is expected to lead to the development of new algorithms for solving such problems

    Uniform Bahadur Representation for Nonparametric Censored Quantile Regression: A Redistribution-of-Mass Approach

    Get PDF
    Censored quantile regressions have received a great deal of attention in the literature. In a linear setup, recent research has found that an estimator based on the idea of “redistribution-of-mass” in Efron (1967, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 831–853, University of California Press) has better numerical performance than other available methods. In this paper, this idea is combined with the local polynomial kernel smoothing for nonparametric quantile regression of censored data. We derive the uniform Bahadur representation for the estimator and, more importantly, give theoretical justification for its improved efficiency over existing estimation methods. We include an example to illustrate the usefulness of such a uniform representation in the context of sufficient dimension reduction in regression analysis. Finally, simulations are used to investigate the finite sample performance of the new estimator

    Boosting Techniques for Nonlinear Time Series Models

    Get PDF
    Many of the popular nonlinear time series models require a priori the choice of parametric functions which are assumed to be appropriate in specific applications. This approach is used mainly in financial applications, when sufficient knowledge is available about the nonlinear structure between the covariates and the response. One principal strategy to investigate a broader class on nonlinear time series is the Nonlinear Additive AutoRegressive (NAAR) model. The NAAR model estimates the lags of a time series as flexible functions in order to detect non-monotone relationships between current observations and past values. We consider linear and additive models for identifying nonlinear relationships. A componentwise boosting algorithm is applied to simultaneous model fitting, variable selection, and model choice. Thus, with the application of boosting for fitting potentially nonlinear models we address the major issues in time series modelling: lag selection and nonlinearity. By means of simulation we compare the outcomes of boosting to the outcomes obtained through alternative nonparametric methods. Boosting shows an overall strong performance in terms of precise estimations of highly nonlinear lag functions. The forecasting potential of boosting is examined on real data where the target variable is the German industrial production (IP). In order to improve the model's forecasting quality we include additional exogenous variables. Thus we address the second major aspect in this paper which concerns the issue of high-dimensionality in models. Allowing additional inputs in the model extends the NAAR model to an even broader class of models, namely the NAARX model. We show that boosting can cope with large models which have many covariates compared to the number of observations

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project
    corecore