695,035 research outputs found

    Semiparametric penalty function method in partially linear model selection

    Get PDF
    Model selection in nonparametric and semiparametric regression is of both theoretical and practical interest. Gao and Tong (2004) proposed a semiparametric leave–more–out cross–validation selection procedure for the choice of both the parametric and nonparametric regressors in a nonlinear time series regression model. As recognized by the authors, the implementation of the proposed procedure requires the availability of relatively large sample sizes. In order to address the model selection problem with small or medium sample sizes, we propose a model selection procedure for practical use. By extending the so–called penalty function method proposed in Zheng and Loh (1995, 1997) through the incorporation of features of the leave-one-out cross-validation approach, we develop a semiparametric, consistent selection procedure suitable for the choice of optimum subsets in a partially linear model. The newly proposed method is implemented using the full set of data, and simulations show that it works well for both small and medium sample sizes.Linear model; model selection; nonparametric method; partially linear model; semiparametric method

    GIBRAT'S LAW REVISITED IN A TRANSITION ECONOMY. THE HUNGARIAN CASE

    Get PDF
    The paper investigates the validity of Gibrat's Law in Hungarian agriculture. Employing various specifications including OLS, two-step Heckman model and quantile regressions our results strongly reject Gibrats Law for full sample. Estimations suggest that small farms tend to grow faster than larger ones. However, splitting the sample into two subgroups (corporate and family farms) we found different results. For family farms however, only OLS regression results reject Gibrat's Law, whilst the two-step Heckman models and quantile regression estimates support it. Finally, for corporate farms our results support the Law regardless of the method or size measure used. Our results indicate that there is no difference between family farms and corporate farms according to the growth trajectory.Gibrat's Law, selection bias, quantile regression, transition agriculture, Farm Management, Research Methods/ Statistical Methods,

    Consistent Order Selection with Strongly Dependent Data and its Application to Efficient Estimation

    Get PDF
    Order selection based on criteria by Akaike (1974), AIC, Schwarz (1978), BIC or Hannan and Quinn (1979) HIC is often applied in empirical examples. They have been used in the context of order selection of weakly dependent ARMA models, AR models with unit or explosive roots and in the context of regression or distributed lag regression models for weakly dependent data. On the other hand, it has been observed that data exhibits the so-called strong dependence in many areas. Because of the interest in this type of data, our main objective in this paper is to examine order selection for a distributed lag regression model that covers in a unified form weak and strong dependence. To that end, and because of the possible adverse properties of the aforementioned criteria, we propose a criterion function based on the decomposition of the variance of the innovations of the model in terms of their frequency components. Assuming that the order of the model is finite, say po , we show that the proposed criterion consistently estimates, po. In addition, we show that adaptive estimation for the parameters of the model is possible without knowledge of po . Finally, a small Monte-Carlo experiment is included to illustrate the finite sample performance of the proposed criterion.Order selection, distributed lag models, strong dependence.

    Tight conditions for consistency of variable selection in the context of high dimensionality

    Get PDF
    We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension d. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. The asymptotic analysis reveals the presence of two quite different re gimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if log d is small compared to the sample size n. The picture is different in the second regime, that is, when the number of relevant variables denoted by s tends to infinity as nn\to\infty. Then we prove that consistent variable selection in nonparametric set-up is possible only if s+loglog d is small compared to log n. We apply these results to derive minimax separation rates for the problem of variableComment: arXiv admin note: text overlap with arXiv:1102.3616; Published in at http://dx.doi.org/10.1214/12-AOS1046 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Model selection in regression under structural constraints

    Full text link
    The paper considers model selection in regression under the additional structural constraints on admissible models where the number of potential predictors might be even larger than the available sample size. We develop a Bayesian formalism as a natural tool for generating a wide class of model selection criteria based on penalized least squares estimation with various complexity penalties associated with a prior on a model size. The resulting criteria are adaptive to structural constraints. We establish the upper bound for the quadratic risk of the resulting MAP estimator and the corresponding lower bound for the minimax risk over a set of admissible models of a given size. We then specify the class of priors (and, therefore, the class of complexity penalties) where for the "nearly-orthogonal" design the MAP estimator is asymptotically at least nearly-minimax (up to a log-factor) simultaneously over an entire range of sparse and dense setups. Moreover, when the numbers of admissible models are "small" (e.g., ordered variable selection) or, on the opposite, for the case of complete variable selection, the proposed estimator achieves the exact minimax rates.Comment: arXiv admin note: text overlap with arXiv:0912.438

    Effects of Influential Points and Sample Size on the Selection and Replicability of Multivariable Fractional Polynomial Models

    Full text link
    The multivariable fractional polynomial (MFP) procedure combines variable selection with a function selection procedure (FSP). For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1 or FP2 functions. Influential observations (IPs) and small sample size can both have an impact on a selected fractional polynomial model. In this paper, we used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In seven subsamples we also investigated the effects of sample size and model replicability. For better illustration, a structured profile was used to provide an overview of all analyses conducted. The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP might not be able to detect non-linear functions and the selected model might differ substantially from the true underlying model. However, if the sample size is sufficient and regression diagnostics are carefully conducted, MFP can be a suitable approach to select variables and functional forms for continuous variables.Comment: Main paper and a supplementary combine
    corecore