35,266 research outputs found

    Futility Analysis in the Cross-Validation of Machine Learning Models

    Full text link
    Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.Comment: 22 pages, 5 figure

    minque: An R Package for Analyzing Various Linear Mixed Models

    Get PDF
    Linear mixed model (LMM) approaches offer much more flexibility comparing ANOVA (analysis of variance) based methods. There are three commonly used LMM approaches: maximum likelihood, restricted maximum likelihood, and minimum norm quadratic unbiased estimation. These three approaches, however, sometimes could also lead low testing power compared to ANOVA methods. Integration of resampling techniques like jackknife could help improve testing power based on both our simulation studies. In this presentation, I will introduce a R package, minque, which integrates LMM approaches and resampling techniques and demonstrate the use of this packages in various linear mixed model analyses

    What can the Real World do for simulation studies? A comparison of exploratory methods

    Get PDF
    For simulation studies on the exploratory factor analysis (EFA), usually rather simple population models are used without model errors. In the present study, real data characteristics are used for Monte Carlo simulation studies. Real large data sets are examined and the results of EFA on them are taken as the population models. First we apply a resampling technique on these data sets with sub samples of different sizes. Then, a Monte Carlo study is conducted based on the parameters of the population model and with some variations of them. Two data sets are analyzed as an illustration. Results suggest that outcomes of simulation studies are always highly influenced by particular specification of the model and its violations. Once small residual correlations appeared in the data for example, the ranking of our methods changed completely. The analysis of real data set characteristics is therefore important to understand the performance of different methods

    Resampling Methods for the Change Analysis of Dependent Data

    Get PDF
    The fundamental question in change-point analysis is whether an observed stochastic process follows one model or whether the underlying model changes at least once during the observational period. Most of the older works discuss independent observations, yet from a practical point of view cases of dependent data have become more and more important. In this dissertation we develop testing procedures for dependent models. In change-point analysis critical values for testing procedures are usually obtained by distributional asymptotics. These critical values, however, do not sufficiently reflect dependency. Moreover it is a well-known fact that convergence rates especially for extreme-value statistics are very slow. Using resampling methods we obtain better approximations, which take possible dependency structures more efficiently into account. We prove that the original statistics and their resampling counterparts follow the same distributional asymptotics. First we obtain limit theorems for the corresponding rank statistics, which then combined with laws of large numbers imply the resampling asymptotics conditionally on the given data. In a first part we consider abrupt and gradual changes in models of possibly dependent observations satisfying a strong invariance principle. The main part of this dissertation studies a location model with dependent errors that form a linear process. Different types of statistics are considered, such as maximum-type statistics (particularly different CUSUM procedures) or sum-type statistics. The resampling-methods have to be adapted to allow for dependent errors. Thus, we analyze a block bootstrap as well as a bootstrap in the frequency domain. Finally, some simulation studies illustrate that the permutation tests usually behave better than the original tests if performance is measured by the type I and II errors, respectively

    New resampling method for evaluating stability of clusters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.</p> <p>We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.</p> <p>Results</p> <p>Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.</p> <p>Conclusion</p> <p>We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.</p

    Particle Efficient Importance Sampling

    Full text link
    The efficient importance sampling (EIS) method is a general principle for the numerical evaluation of high-dimensional integrals that uses the sequential structure of target integrands to build variance minimising importance samplers. Despite a number of successful applications in high dimensions, it is well known that importance sampling strategies are subject to an exponential growth in variance as the dimension of the integration increases. We solve this problem by recognising that the EIS framework has an offline sequential Monte Carlo interpretation. The particle EIS method is based on non-standard resampling weights that take into account the look-ahead construction of the importance sampler. We apply the method for a range of univariate and bivariate stochastic volatility specifications. We also develop a new application of the EIS approach to state space models with Student's t state innovations. Our results show that the particle EIS method strongly outperforms both the standard EIS method and particle filters for likelihood evaluation in high dimensions. Moreover, the ratio between the variances of the particle EIS and particle filter methods remains stable as the time series dimension increases. We illustrate the efficiency of the method for Bayesian inference using the particle marginal Metropolis-Hastings and importance sampling squared algorithms
    corecore