695,035 research outputs found
Semiparametric penalty function method in partially linear model selection
Model selection in nonparametric and semiparametric regression is of both theoretical and practical interest. Gao and Tong (2004) proposed a semiparametric leave–more–out cross–validation selection procedure for the choice of both the parametric and nonparametric regressors in a nonlinear time series regression model. As recognized by the authors, the implementation of the proposed procedure requires the availability of relatively large sample sizes. In order to address the model selection problem with small or medium sample sizes, we propose a model selection procedure for practical use. By extending the so–called penalty function method proposed in Zheng and Loh (1995, 1997) through the incorporation of features of the leave-one-out cross-validation approach, we develop a semiparametric, consistent selection procedure suitable for the choice of optimum subsets in a partially linear model. The newly proposed method is implemented using the full set of data, and simulations show that it works well for both small and medium sample sizes.Linear model; model selection; nonparametric method; partially linear model; semiparametric method
GIBRAT'S LAW REVISITED IN A TRANSITION ECONOMY. THE HUNGARIAN CASE
The paper investigates the validity of Gibrat's Law in Hungarian agriculture. Employing various specifications including OLS, two-step Heckman model and quantile regressions our results strongly reject Gibrats Law for full sample. Estimations suggest that small farms tend to grow faster than larger ones. However, splitting the sample into two subgroups (corporate and family farms) we found different results. For family farms however, only OLS regression results reject Gibrat's Law, whilst the two-step Heckman models and quantile regression estimates support it. Finally, for corporate farms our results support the Law regardless of the method or size measure used. Our results indicate that there is no difference between family farms and corporate farms according to the growth trajectory.Gibrat's Law, selection bias, quantile regression, transition agriculture, Farm Management, Research Methods/ Statistical Methods,
Consistent Order Selection with Strongly Dependent Data and its Application to Efficient Estimation
Order selection based on criteria by Akaike (1974), AIC, Schwarz (1978), BIC or Hannan and Quinn (1979) HIC is often applied in empirical examples. They have been used in the context of order selection of weakly dependent ARMA models, AR models with unit or explosive roots and in the context of regression or distributed lag regression models for weakly dependent data. On the other hand, it has been observed that data exhibits the so-called strong dependence in many areas. Because of the interest in this type of data, our main objective in this paper is to examine order selection for a distributed lag regression model that covers in a unified form weak and strong dependence. To that end, and because of the possible adverse properties of the aforementioned criteria, we propose a criterion function based on the decomposition of the variance of the innovations of the model in terms of their frequency components. Assuming that the order of the model is finite, say po , we show that the proposed criterion consistently estimates, po. In addition, we show that adaptive estimation for the parameters of the model is possible without knowledge of po . Finally, a small Monte-Carlo experiment is included to illustrate the finite sample performance of the proposed criterion.Order selection, distributed lag models, strong dependence.
Tight conditions for consistency of variable selection in the context of high dimensionality
We address the issue of variable selection in the regression model with very
high ambient dimension, that is, when the number of variables is very large.
The main focus is on the situation where the number of relevant variables,
called intrinsic dimension, is much smaller than the ambient dimension d.
Without assuming any parametric form of the underlying regression function, we
get tight conditions making it possible to consistently estimate the set of
relevant variables. These conditions relate the intrinsic dimension to the
ambient dimension and to the sample size. The procedure that is provably
consistent under these tight conditions is based on comparing quadratic
functionals of the empirical Fourier coefficients with appropriately chosen
threshold values. The asymptotic analysis reveals the presence of two quite
different re gimes. The first regime is when the intrinsic dimension is fixed.
In this case the situation in nonparametric regression is the same as in linear
regression, that is, consistent variable selection is possible if and only if
log d is small compared to the sample size n. The picture is different in the
second regime, that is, when the number of relevant variables denoted by s
tends to infinity as . Then we prove that consistent variable
selection in nonparametric set-up is possible only if s+loglog d is small
compared to log n. We apply these results to derive minimax separation rates
for the problem of variableComment: arXiv admin note: text overlap with arXiv:1102.3616; Published in at
http://dx.doi.org/10.1214/12-AOS1046 the Annals of Statistics
(http://www.imstat.org/aos/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Model selection in regression under structural constraints
The paper considers model selection in regression under the additional
structural constraints on admissible models where the number of potential
predictors might be even larger than the available sample size. We develop a
Bayesian formalism as a natural tool for generating a wide class of model
selection criteria based on penalized least squares estimation with various
complexity penalties associated with a prior on a model size. The resulting
criteria are adaptive to structural constraints. We establish the upper bound
for the quadratic risk of the resulting MAP estimator and the corresponding
lower bound for the minimax risk over a set of admissible models of a given
size. We then specify the class of priors (and, therefore, the class of
complexity penalties) where for the "nearly-orthogonal" design the MAP
estimator is asymptotically at least nearly-minimax (up to a log-factor)
simultaneously over an entire range of sparse and dense setups. Moreover, when
the numbers of admissible models are "small" (e.g., ordered variable selection)
or, on the opposite, for the case of complete variable selection, the proposed
estimator achieves the exact minimax rates.Comment: arXiv admin note: text overlap with arXiv:0912.438
Effects of Influential Points and Sample Size on the Selection and Replicability of Multivariable Fractional Polynomial Models
The multivariable fractional polynomial (MFP) procedure combines variable
selection with a function selection procedure (FSP). For continuous variables,
a closed test procedure is used to decide between no effect, linear, FP1 or FP2
functions. Influential observations (IPs) and small sample size can both have
an impact on a selected fractional polynomial model. In this paper, we used
simulated data with six continuous and four categorical predictors to
illustrate approaches which can help to identify IPs with an influence on
function selection and the MFP model. Approaches use leave-one or two-out and
two related techniques for a multivariable assessment. In seven subsamples we
also investigated the effects of sample size and model replicability. For
better illustration, a structured profile was used to provide an overview of
all analyses conducted. The results showed that one or more IPs can drive the
functions and models selected. In addition, with a small sample size, MFP might
not be able to detect non-linear functions and the selected model might differ
substantially from the true underlying model. However, if the sample size is
sufficient and regression diagnostics are carefully conducted, MFP can be a
suitable approach to select variables and functional forms for continuous
variables.Comment: Main paper and a supplementary combine
- …