753 research outputs found

    Soybean yield modeling using bootstrap methods for small samples

    Get PDF
    One of the problems that occur when working with regression models is regarding the sample size; once the statistical methods used in inferential analyzes are asymptotic if the sample is small the analysis may be compromised because the estimates will be biased. An alternative is to use the bootstrap methodology, which in its non-parametric version does not need to guess or know the probability distribution that generated the original sample. In this work we used a set of soybean yield data and physical and chemical soil properties formed with fewer samples to determine a multiple linear regression model. Bootstrap methods were used for variable selection, identification of influential points and for determination of confidence intervals of the model parameters. The results showed that the bootstrap methods enabled us to select the physical and chemical soil properties, which were significant in the construction of the soybean yield regression model, construct the confidence intervals of the parameters and identify the points that had great influence on the estimated parameters

    Soybean yield modeling using bootstrap methods for small samples

    Full text link

    Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

    Get PDF
    This paper presents a robust two-stage procedure for identification of outlying observations in regression analysis. The exploratory stage identifies leverage points and vertical outliers through a robust distance estimator based on Minimum Covariance Determinant (MCD). After deletion of these points, the confirmatory stage carries out an Ordinary Least Squares (OLS) analysis on the remaining subset of data and investigates the effect of adding back in the previously deleted observations. Cut-off points pertinent to different diagnostics are generated by bootstrapping and the cases are definitely labelled as good-leverage, bad-leverage, vertical outliers and typical cases. The procedure is applied to four examples

    Multivariate Generalization of Reduced Major Axis Regression

    Get PDF
    abstract: A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing the sum of a function of the volumes obtained by drawing, from each data point, lines parallel to each coordinate axis to the fitted plane (Draper and Yang 1997; Goodman and Tofallis 2003). Generalized RMA results for the multivariate case obtained by Draper and Yang (1997) are reviewed and some investigations of multivariate RMA are given. A linear model is proposed that does not specify a dependent variable and allows for errors in the measurement of each variable. Coefficients in the model are estimated by minimization of the function of the volumes previously mentioned. Methods for obtaining coefficient estimates are discussed and simulations are used to investigate the distribution of coefficient estimates. The effects of sample size, sampling error and correlation among variables on the estimates are studied. Bootstrap methods are used to obtain confidence intervals for model coefficients. Residual analysis is considered for assessing model assumptions. Outlier and influential case diagnostics are developed and a forward selection method is proposed for subset selection of model variables. A real data example is provided that uses the methods developed. Topics for further research are discussed.Dissertation/ThesisPh.D. Statistics 201

    On the Sensitivity of Return to Schooling Estimates to Estimation Methods, Model Specification, and Influential Outliers If Identification Is Weak

    Get PDF
    We provide a comparison of return to schooling estimates based on an influential study by Angrist and Krueger (1991) using two stage least squares (TSLS), limited information maximum likelihood (LIML), jackknife (JIVE), and split sample instrumental variables (SSIV) estimation. We find that the estimated return to education is quite sensitive to the age controls used in the models as well as the estimation method used. In particular, we provide evidence that JIVE coefficients' standard errors are inflated by a group of extreme years of education observations, for which identification is especially weak. We propose to use Cook's Distance in order to identify influential outliers having substantial influence on first-stage JIVE coefficients and fitted values.Cook's Distance, heteroskedasticity, outliers, return to education, specification, weak instruments

    Applying Bootstrap Robust Regression Method on Data with Outliers

    Get PDF
    Identification and assessment of outliers have a key role in Ordinary Least Squares (OLS) regression analysis. This paper presents a robust two-stage procedure to identify outlying observations in regression analysis. The exploratory stage identifies leverage points and vertical outliers through a robust distance estimator based on Minimum Covariance Determinant (MCD). After deletion of these points, the confirmatory stage carries out an OLS analysis on the remaining subset of data and investigates the effect of adding back in the previously deleted observations. Cut-off points pertinent to different diagnostics are generated by bootstrapping and the cases are definitely labeled as good-leverage, bad leverage, vertical outliers and typical cases. This procedure is applied to four examples taken from the literature and it is effective in rightly pinpointing outlying observations, even in the presence of substantial masking. This procedure is able to identify and correctly classify vertical outliers, good and bad leverage points, through the use of jackknife-after-bootstrap robust cut-off points. Moreover its two stage structure makes it interactive and this enables the user to reach a deeper understanding of the dataset main features than resorting to an automatic procedure

    Weight Adjustment Methods and Their Impact on Sample-based Inference

    Get PDF
    Weighting samples is important to reflect not only sample design decisions made at the planning stage, but also practical issues that arise during data collection and cleaning that necessitate weighting adjustments. Adjustments to base weights are used to account for these planned and unplanned eventualities. Often these adjustments lead to variations in the survey weights from the original selection weights (i.e., the weights based solely on the sample units' probabilities of selection). Large variation in survey weights can cause inferential problems for data users. A few extremely large weights in a sample dataset can produce unreasonably large estimates of national- and domain-level estimates and their variances in particular samples, even when the estimators are unbiased over many samples. Design-based and model-based methods have been developed to adjust such extreme weights; both approaches aim to trim weights such that the overall mean square error (MSE) is lowered by decreasing the variance more than increasing the square of the bias. Design-based methods tend to be ad hoc, while Bayesian model-based methods account for population structure but can be computationally demanding. I present three research papers that expand the current weight trimming approaches under the goal of developing a broader framework that connects gaps and improves the existing alternatives. The first paper proposes more in-depth investigations of and extensions to a newly developed method called generalized design-based inference, where we condition on the realized sample and model the survey weight as a function of the response variables. This method has potential for reducing the MSE of a finite population total estimator in certain circumstances. However, there may be instances where the approach is inappropriate, so this paper includes an in-depth examination of the related theory. The second paper incorporates Bayesian prior assumptions into model-assisted penalized estimators to produce a more efficient yet robust calibration-type estimator. I also evaluate existing variance estimators for the proposed estimator. Comparisons to other estimators that are in the literature are also included. In the third paper, I develop summary- and unit-level diagnostic tools that measure the impact of variation of weights and of extreme individual weights on survey-based inference. I propose design effects to summarize the impact of variable weights produced under calibration weighting adjustments under single-stage and cluster sampling. A new diagnostic for identifying influential, individual points is also introduced in the third paper
    • 

    corecore