Search CORE

68,957 research outputs found

Survival Estimation Using Bootstrap, Jackknife and K-Repeated Jackknife Methods

Author: Adewara Johnson A
Mbata Ugochukwu A
Publication venue: DigitalCommons@WayneState
Publication date: 01/11/2014
Field of study

Three re-sampling techniques are used to estimate the survival probabilities from an exponential life-time distribution. The aim is to employ a technique to obtain a parameter estimate for a two-parameter exponential distribution. The re-sampling methods considered are: Bootstrap estimation method (BE), Jackknife estimation method (JE) and the k-repeated Jackknife estimation method (KJE). The methods were computed to obtain the mean square error (MSE) and mean percentage error (MPE) based on simulated data. The estimates of the two-parameter exponential distribution were substituted to estimate survival probabilities. Results show that the MSE value is reduced when the K–repeated jackknife method is used

Digital Commons@Wayne State University

Selection of the number of frequencies using bootstrap techniques in log-periodogram regression

Author: Arteche González Jesús María
Orbe Lizundia Jesús María
Publication venue
Publication date
Field of study

The choice of the bandwidth in the local log-periodogram regression is of crucial importance for estimation of the memory parameter of a long memory time series. Different choices may give rise to completely different estimates, which may lead to contradictory conclusions, for example about the stationarity of the series. We propose here a data driven bandwidth selection strategy that is based on minimizing a bootstrap approximation of the mean squared error and compare its performance with other existing techniques for optimal bandwidth selection in a mean squared error sense, revealing its better performance in a wider class of models. The empirical applicability of the proposed strategy is shown with two examples: the widely analyzed in a long memory context Nile river annual minimum levels and the input gas rate series of Box and Jenkins.bootstrap, long memory, log-periodogram regression, bandwidth selection

Research Papers in Economics

COMPARING BINOMIAL BOOTSTRAP AND BAYESIAN ESTIMATION METHODS IN ASSESSING THE AGREEMENT BETWEEN CLASSIFIED IMAGES AND GROUND TRUTH DATA.

Author: Price William J.
Shafii Bahman
Publication venue: 'New Prairie Press'
Publication date: 29/04/2001
Field of study

The degree of agreement between classification and ground truth in remotely sensed data is often quantified with an error matrix and summarized using agreement measures such as Cohen\u27s kappa. In the case of ground truth however, the kappa statistic can be shown to be a transformation of the marginal proportions commonly referred to as omissional and commissional error rates. A more meaningful statistical interpretation of remote sensing results and less ambiguous conclusions can be obtained via direct utilization of these measures. Several estimation techniques have been suggested for these marginal proportions. In this study, we will develop the exact binomial, bootstrap and Bayesian estimation methods for omissional and commissional errors. Emphasis will be placed on comparing the various estimation methods and their corresponding empirical distributions. Results are demonstrated with reference to a study designed to evaluate the detectability of yellow hawkweed and oxeye daisy using multispectral digital imagery in Northern Idaho

Kansas State University

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Author: Odetunde OO
Olaomi JO
Olubusoye OE
Publication venue: 'African Journals Online (AJOL)'
Publication date: 05/08/2008
Field of study

A bootstrap simulation approach was used to generate values for endogenous variables of a simultaneous equation model popularly known as Keynesian Model of Income Determination. Three sample sizes 20, 30 and 40 each replicated 10, 20 and 30 times were considered. Four different estimation techniques: Ordinary Least Square (OLS); Indirect Least Square (ILS); Two-Stage Least Square (2SLS) and Full Information Maximum Likelihood (FIML) methods were employed to estimate the parameters of the model. The estimators were then evaluated using the average parameter estimates; absolute bias of the estimates and the root mean square error of the estimates. The result shows that generally, ILS provided the best estimates. Keywords: Bootstrap, endogenous, exogenous, least squares, maximum likelihood.African Research Review Vol. 2 (3) 2008: pp. 51-6

AJOL - African Journals Online

The bootstrap -A review

Author: González-Manteiga Wenceslao
Prada Sánchez José Manuel
Romo Juan
Publication venue
Publication date: 01/05/1992
Field of study

The bootstrap, extensively studied during the last decade, has become a powerful tool in different areas of Statistical Inference. In this work, we present the main ideas of bootstrap methodology in several contexts, citing the most relevant contributions and illustrating with examples and simulation studies some interesting aspects

Universidad Carlos III de Madrid e-Archivo

Linear regression for data having multicollinearity, heteroscedasticity and outliers

Author: Rasheed Bello AbdulKadiri
Publication venue
Publication date: 01/01/2017
Field of study

Evaluation of regression model is very much influenced by the choice of accurate estimation method since it can produce different conclusions from the empirical results. Thus, it is important to use appropriate estimation method in accordance with the type of statistical data. Although reliable for a single or a few outliers, standard diagnostic techniques from wild bootstrap fit can fail while the existing robust wild bootstrap based on MM-estimator is not resistant to high leverage points. The presence of high leverage points introduces multicollinearity while the MM-estimator is also not resistant to the presence of multicollinearity in the data. This research proposes new methods that deal with heteroscedasticity, multicollinearity, outliers and high leverage points more effectively than currently published methods. The proposed methods are called modified robust wild bootstrap, modified robust principal component (PC) with wild bootstrap and modified robust partial least squares (PLS) with wild bootstrap estimations. These methods are based on weighted procedures that incorporate generalized M-estimator (GM-estimator) with initial and scale estimate using S-estimator and MM-estimator. In addition, the multicollinearity diagnostics procedures of PC and PLS were also used together with the wild bootstrap sampling procedure of Wu and Liu. Empirical applications of data for national growth, income per capital data of the Organisation of Economic Community Development (OECD) countries and tobacco data were used to compare the performance between wild bootstrap, robust wild bootstrap, modified robust wild bootstrap, modified robust PC with wild bootstrap and modified robust PLS with wild bootstrap methods. A comprehensive simulation study evaluates the impacts of heteroscedasticity, multicollinearity outliers and high leverage points on numerous existing methods. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for modified robust wild bootstrap estimation in heteroscedasticity data with outliers and high leverage points. On the other hand, the modified robust PC with wild bootstrap estimation and modified robust PLS with wild bootstrap estimation is more effective in multicollinearity, heteroscedasticity, outliers and high leverage points. Moreover, for both methods, the modified robust sampling procedure of Liu based on Tukey biweight with initial and scale estimate from MM-estimator tend to be the best. While the best method for data with multicollinearity, heteroscedasticity, outliers and high leverage points is the modified robust PC with wild bootstrap estimation. This research shows the ability of the computationally intense method and viability of combining three different weighting procedures namely robust GM-estimation, wild bootstrap and multicollinearity diagnostic methods of PLS and PC to achieve accurate regression model. In conclusion, this study is able to improve parameter estimation of linear regression by enhancing the existing methods to consider the problem of multicollinearity, heteroscedasticity, outliers and high leverage points in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model

Universiti Teknologi Malaysia Institutional Repository

Early Accurate Results for Advanced Analytics on MapReduce

Author: Laptev Nikolay
Zaniolo Carlo
Zeng Kai
Publication venue
Publication date: 01/01/2012
Field of study

Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to minimize the changes required to the MapReduce framework. Various tests of EARL of Hadoop are presented to characterize the frequent situations where EARL can provide major speed-ups over the current version of Hadoop.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

VerdictDB: Universalizing Approximate Query Processing

Author: Bickel P. J.
Bootstrapping Sample Survey Data Comparing Recent
Canty A. J.
Condie T.
Eykholt K.
Flajolet P.
Ganti V.
Hall P.
Kleiner A.
Mayo D. G.
Meliou A.
Mozafari B.
Mozafari B.
Mozafari B.
Mozafari B.
Olston C.
Park Y.
Politis D. N.
Sidirourgos L.
Su H.
Vrbsky S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/11/2018
Field of study

Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, and the preoccupation of newer vendors (e.g., SQL-on-Hadoop products) with implementing standard features. Additionally, the few AQP engines that are available are each tied to a specific platform and require users to completely abandon their existing databases---an unrealistic expectation given the infancy of the AQP technology. Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. Our proposal, called VerdictDB, uses a middleware architecture that requires no changes to the backend database, and thus, can work with all off-the-shelf engines. Operating at the driver-level, VerdictDB intercepts analytical queries issued to the database and rewrites them into another query that, if executed by any standard relational engine, will yield sufficient information for computing an approximate answer. VerdictDB uses the returned result set to compute an approximate answer and error estimates, which are then passed on to the user or application. However, lack of access to the query execution layer introduces significant challenges in terms of generality, correctness, and efficiency. This paper shows how VerdictDB overcomes these challenges and delivers up to 171

\times

speedup (18.45

\times

on average) for a variety of existing engines, such as Impala, Spark SQL, and Amazon Redshift, while incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache License.Comment: Extended technical report of the paper that appeared in Proceedings of the 2018 International Conference on Management of Data, pp. 1461-1476. ACM, 201

arXiv.org e-Print Archive

Crossref

Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models

Author: Chatterjee Snigdhansu
Lahiri Partha
Li Huilin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Empirical best linear unbiased prediction (EBLUP) method uses a linear mixed model in combining information from different sources of information. This method is particularly useful in small area problems. The variability of an EBLUP is traditionally measured by the mean squared prediction error (MSPE), and interval estimates are generally constructed using estimates of the MSPE. Such methods have shortcomings like under-coverage or over-coverage, excessive length and lack of interpretability. We propose a parametric bootstrap approach to estimate the entire distribution of a suitably centered and scaled EBLUP. The bootstrap histogram is highly accurate, and differs from the true EBLUP distribution by only

O(d^3n^{-3/2})

, where

d

is the number of parameters and

n

the number of observations. This result is used to obtain highly accurate prediction intervals. Simulation results demonstrate the superiority of this method over existing techniques of constructing prediction intervals in linear mixed models.Comment: Published in at http://dx.doi.org/10.1214/07-AOS512 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref