68,957 research outputs found
Survival Estimation Using Bootstrap, Jackknife and K-Repeated Jackknife Methods
Three re-sampling techniques are used to estimate the survival probabilities from an exponential life-time distribution. The aim is to employ a technique to obtain a parameter estimate for a two-parameter exponential distribution. The re-sampling methods considered are: Bootstrap estimation method (BE), Jackknife estimation method (JE) and the k-repeated Jackknife estimation method (KJE). The methods were computed to obtain the mean square error (MSE) and mean percentage error (MPE) based on simulated data. The estimates of the two-parameter exponential distribution were substituted to estimate survival probabilities. Results show that the MSE value is reduced when the K–repeated jackknife method is used
Selection of the number of frequencies using bootstrap techniques in log-periodogram regression
The choice of the bandwidth in the local log-periodogram regression is of crucial importance for estimation of the memory parameter of a long memory time series. Different choices may give rise to completely different estimates, which may lead to contradictory conclusions, for example about the stationarity of the series. We propose here a data driven bandwidth selection strategy that is based on minimizing a bootstrap approximation of the mean squared error and compare its performance with other existing techniques for optimal bandwidth selection in a mean squared error sense, revealing its better performance in a wider class of models. The empirical applicability of the proposed strategy is shown with two examples: the widely analyzed in a long memory context Nile river annual minimum levels and the input gas rate series of Box and Jenkins.bootstrap, long memory, log-periodogram regression, bandwidth selection
COMPARING BINOMIAL BOOTSTRAP AND BAYESIAN ESTIMATION METHODS IN ASSESSING THE AGREEMENT BETWEEN CLASSIFIED IMAGES AND GROUND TRUTH DATA.
The degree of agreement between classification and ground truth in remotely sensed data is often quantified with an error matrix and summarized using agreement measures such as Cohen\u27s kappa. In the case of ground truth however, the kappa statistic can be shown to be a transformation of the marginal proportions commonly referred to as omissional and commissional error rates. A more meaningful statistical interpretation of remote sensing results and less ambiguous conclusions can be obtained via direct utilization of these measures. Several estimation techniques have been suggested for these marginal proportions. In this study, we will develop the exact binomial, bootstrap and Bayesian estimation methods for omissional and commissional errors. Emphasis will be placed on comparing the various estimation methods and their corresponding empirical distributions. Results are demonstrated with reference to a study designed to evaluate the detectability of yellow hawkweed and oxeye daisy using multispectral digital imagery in Northern Idaho
Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model
A bootstrap simulation approach was used to generate values for endogenous variables of a simultaneous equation model popularly known as Keynesian Model of Income Determination. Three sample sizes 20, 30 and 40 each replicated 10, 20 and 30 times were
considered. Four different estimation techniques: Ordinary Least Square (OLS); Indirect Least Square (ILS); Two-Stage Least Square (2SLS) and Full Information Maximum Likelihood (FIML) methods were employed to estimate the parameters of the model. The
estimators were then evaluated using the average parameter estimates; absolute bias of the estimates and the root mean square error of the estimates. The result shows that generally, ILS provided the best estimates. Keywords: Bootstrap, endogenous, exogenous, least squares, maximum likelihood.African Research Review Vol. 2 (3) 2008: pp. 51-6
The bootstrap -A review
The bootstrap, extensively studied during the last decade, has become a powerful tool in different areas of Statistical Inference. In this work, we present the main ideas of bootstrap methodology in several contexts, citing the most relevant contributions and illustrating with examples and simulation studies some interesting aspects
Linear regression for data having multicollinearity, heteroscedasticity and outliers
Evaluation of regression model is very much influenced by the choice of accurate estimation method since it can produce different conclusions from the empirical results. Thus, it is important to use appropriate estimation method in accordance with the type of statistical data. Although reliable for a single or a few outliers, standard diagnostic techniques from wild bootstrap fit can fail while the existing robust wild bootstrap based on MM-estimator is not resistant to high leverage points. The presence of high leverage points introduces multicollinearity while the MM-estimator is also not resistant to the presence of multicollinearity in the data. This research proposes new methods that deal with heteroscedasticity, multicollinearity, outliers and high leverage points more effectively than currently published methods. The proposed methods are called modified robust wild bootstrap, modified robust principal component (PC) with wild bootstrap and modified robust partial least squares (PLS) with wild bootstrap estimations. These methods are based on weighted procedures that incorporate generalized M-estimator (GM-estimator) with initial and scale estimate using S-estimator and MM-estimator. In addition, the multicollinearity diagnostics procedures of PC and PLS were also used together with the wild bootstrap sampling procedure of Wu and Liu. Empirical applications of data for national growth, income per capital data of the Organisation of Economic Community Development (OECD) countries and tobacco data were used to compare the performance between wild bootstrap, robust wild bootstrap, modified robust wild bootstrap, modified robust PC with wild bootstrap and modified robust PLS with wild bootstrap methods. A comprehensive simulation study evaluates the impacts of heteroscedasticity, multicollinearity outliers and high leverage points on numerous existing methods. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for modified robust wild bootstrap estimation in heteroscedasticity data with outliers and high leverage points. On the other hand, the modified robust PC with wild bootstrap estimation and modified robust PLS with wild bootstrap estimation is more effective in multicollinearity, heteroscedasticity, outliers and high leverage points. Moreover, for both methods, the modified robust sampling procedure of Liu based on Tukey biweight with initial and scale estimate from MM-estimator tend to be the best. While the best method for data with multicollinearity, heteroscedasticity, outliers and high leverage points is the modified robust PC with wild bootstrap estimation. This research shows the ability of the computationally intense method and viability of combining three different weighting procedures namely robust GM-estimation, wild bootstrap and multicollinearity diagnostic methods of PLS and PC to achieve accurate regression model. In conclusion, this study is able to improve parameter estimation of linear regression by enhancing the existing methods to consider the problem of multicollinearity, heteroscedasticity, outliers and high leverage points in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
VerdictDB: Universalizing Approximate Query Processing
Despite 25 years of research in academia, approximate query processing (AQP)
has had little industrial adoption. One of the major causes of this slow
adoption is the reluctance of traditional vendors to make radical changes to
their legacy codebases, and the preoccupation of newer vendors (e.g.,
SQL-on-Hadoop products) with implementing standard features. Additionally, the
few AQP engines that are available are each tied to a specific platform and
require users to completely abandon their existing databases---an unrealistic
expectation given the infancy of the AQP technology. Therefore, we argue that a
universal solution is needed: a database-agnostic approximation engine that
will widen the reach of this emerging technology across various platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires
no changes to the backend database, and thus, can work with all off-the-shelf
engines. Operating at the driver-level, VerdictDB intercepts analytical queries
issued to the database and rewrites them into another query that, if executed
by any standard relational engine, will yield sufficient information for
computing an approximate answer. VerdictDB uses the returned result set to
compute an approximate answer and error estimates, which are then passed on to
the user or application. However, lack of access to the query execution layer
introduces significant challenges in terms of generality, correctness, and
efficiency. This paper shows how VerdictDB overcomes these challenges and
delivers up to 171 speedup (18.45 on average) for a variety of
existing engines, such as Impala, Spark SQL, and Amazon Redshift, while
incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache
License.Comment: Extended technical report of the paper that appeared in Proceedings
of the 2018 International Conference on Management of Data, pp. 1461-1476.
ACM, 201
Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models
Empirical best linear unbiased prediction (EBLUP) method uses a linear mixed
model in combining information from different sources of information. This
method is particularly useful in small area problems. The variability of an
EBLUP is traditionally measured by the mean squared prediction error (MSPE),
and interval estimates are generally constructed using estimates of the MSPE.
Such methods have shortcomings like under-coverage or over-coverage, excessive
length and lack of interpretability. We propose a parametric bootstrap approach
to estimate the entire distribution of a suitably centered and scaled EBLUP.
The bootstrap histogram is highly accurate, and differs from the true EBLUP
distribution by only , where is the number of parameters
and the number of observations. This result is used to obtain highly
accurate prediction intervals. Simulation results demonstrate the superiority
of this method over existing techniques of constructing prediction intervals in
linear mixed models.Comment: Published in at http://dx.doi.org/10.1214/07-AOS512 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …