2,056 research outputs found
Penalized single-index quantile regression
This article is made available through the Brunel Open Access Publishing Fund. Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution
license (http://creativecommons.org/licenses/by/3.0/).The single-index (SI) regression and single-index quantile (SIQ) estimation methods product linear combinations of all the original predictors. However, it is possible that there are many unimportant predictors within the original predictors. Thus, the precision of parameter estimation as well as the accuracy of prediction will be effected by the existence of those unimportant predictors when the previous methods are used. In this article, an extension of the SIQ method of Wu et al. (2010) has been proposed, which considers Lasso and Adaptive Lasso for estimation and variable selection. Computational algorithms have been developed in order to calculate the penalized SIQ estimates. A simulation study and a real data application have been used to assess the performance of the methods under consideration
Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection
In high-dimensional model selection problems, penalized simple least-square
approaches have been extensively used. This paper addresses the question of
both robustness and efficiency of penalized model selection methods, and
proposes a data-driven weighted linear combination of convex loss functions,
together with weighted -penalty. It is completely data-adaptive and does
not require prior knowledge of the error distribution. The weighted
-penalty is used both to ensure the convexity of the penalty term and to
ameliorate the bias caused by the -penalty. In the setting with
dimensionality much larger than the sample size, we establish a strong oracle
property of the proposed method that possesses both the model selection
consistency and estimation efficiency for the true non-zero coefficients. As
specific examples, we introduce a robust method of composite L1-L2, and optimal
composite quantile method and evaluate their performance in both simulated and
real data examples
Mathematical Statistics of Partially Identified Objects
The workshop brought together leading experts in mathematical statistics, theoretical econometrics and bio-mathematics interested in mathematical objects occurring in the analysis of partially identified structures. The mathematical core of these ubiquitous structures has an impact on all three research areas and is expected to lead to the development of new algorithms for solving such problems
Uniform Bahadur Representation for Nonparametric Censored Quantile Regression: A Redistribution-of-Mass Approach
Censored quantile regressions have received a great deal of attention in the literature. In a linear setup, recent research has found that an estimator based on the idea of “redistribution-of-mass” in Efron (1967, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 831–853, University of California Press) has better numerical performance than other available methods. In this paper, this idea is combined with the local polynomial kernel smoothing for nonparametric quantile regression of censored data. We derive the uniform Bahadur representation for the estimator and, more importantly, give theoretical justification for its improved efficiency over existing estimation methods. We include an example to illustrate the usefulness of such a uniform representation in the context of sufficient dimension reduction in regression analysis. Finally, simulations are used to investigate the finite sample performance of the new estimator
Boosting Techniques for Nonlinear Time Series Models
Many of the popular nonlinear time series models require a priori the choice of parametric functions which are assumed to be appropriate in specific applications. This approach is used mainly in financial applications, when sufficient knowledge is available about the nonlinear structure between the covariates and the response. One principal strategy to investigate a broader class on nonlinear time series is the Nonlinear Additive AutoRegressive (NAAR) model. The NAAR model estimates the lags of a time series as flexible functions in order to detect non-monotone relationships between current observations and past values.
We consider linear and additive models for identifying nonlinear relationships. A componentwise boosting algorithm is applied to simultaneous model fitting, variable selection, and model choice. Thus, with the application of boosting for fitting potentially nonlinear models we address the major issues in time series modelling: lag selection and nonlinearity. By means of simulation we compare the outcomes of boosting to the outcomes obtained through alternative nonparametric methods. Boosting shows an overall strong performance in terms of precise estimations of highly nonlinear lag functions. The forecasting potential of boosting is examined on real data where the target variable is the German industrial
production (IP). In order to improve the model's forecasting
quality we include additional exogenous variables. Thus we address the second major aspect in this paper which concerns the issue of high-dimensionality in models. Allowing additional inputs in the model extends the NAAR model to an even broader class of models, namely the NAARX model. We show that boosting can cope with large models which have many covariates compared to the number of observations
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
- …