35,266 research outputs found
Futility Analysis in the Cross-Validation of Machine Learning Models
Many machine learning models have important structural tuning parameters that
cannot be directly estimated from the data. The common tactic for setting these
parameters is to use resampling methods, such as cross--validation or the
bootstrap, to evaluate a candidate set of values and choose the best based on
some pre--defined criterion. Unfortunately, this process can be time consuming.
However, the model tuning process can be streamlined by adaptively resampling
candidate values so that settings that are clearly sub-optimal can be
discarded. The notion of futility analysis is introduced in this context. An
example is shown that illustrates how adaptive resampling can be used to reduce
training time. Simulation studies are used to understand how the potential
speed--up is affected by parallel processing techniques.Comment: 22 pages, 5 figure
minque: An R Package for Analyzing Various Linear Mixed Models
Linear mixed model (LMM) approaches offer much more flexibility comparing ANOVA (analysis of variance) based methods. There are three commonly used LMM approaches: maximum likelihood, restricted maximum likelihood, and minimum norm quadratic unbiased estimation. These three approaches, however, sometimes could also lead low testing power compared to ANOVA methods. Integration of resampling techniques like jackknife could help improve testing power based on both our simulation studies. In this presentation, I will introduce a R package, minque, which integrates LMM approaches and resampling techniques and demonstrate the use of this packages in various linear mixed model analyses
What can the Real World do for simulation studies? A comparison of exploratory methods
For simulation studies on the exploratory factor analysis (EFA), usually rather simple population models are used without model errors. In the present study, real data characteristics are used for Monte Carlo simulation studies. Real large data sets are examined and the results of EFA on them are taken as the population models. First we apply a resampling technique on these data sets with sub samples of different sizes. Then, a Monte Carlo study is conducted based on the parameters of the population model and with some variations of them. Two data sets are analyzed as an illustration. Results suggest that outcomes of simulation studies are always highly influenced by particular specification of the model and its violations. Once small residual correlations appeared in the data for example, the ranking of our methods changed completely. The analysis of real data set characteristics is therefore important to understand the performance of different methods
Resampling Methods for the Change Analysis of Dependent Data
The fundamental question in change-point analysis is whether an observed stochastic process follows one model or whether the underlying model changes at least once during the observational period. Most of the older works discuss independent observations, yet from a practical point of view cases of dependent data have become more and more important. In this dissertation we develop testing procedures for dependent models. In change-point analysis critical values for testing procedures are usually obtained by distributional asymptotics. These critical values, however, do not sufficiently reflect dependency. Moreover it is a well-known fact that convergence rates especially for extreme-value statistics are very slow. Using resampling methods we obtain better approximations, which take possible dependency structures more efficiently into account. We prove that the original statistics and their resampling counterparts follow the same distributional asymptotics. First we obtain limit theorems for the corresponding rank statistics, which then combined with laws of large numbers imply the resampling asymptotics conditionally on the given data. In a first part we consider abrupt and gradual changes in models of possibly dependent observations satisfying a strong invariance principle. The main part of this dissertation studies a location model with dependent errors that form a linear process. Different types of statistics are considered, such as maximum-type statistics (particularly different CUSUM procedures) or sum-type statistics. The resampling-methods have to be adapted to allow for dependent errors. Thus, we analyze a block bootstrap as well as a bootstrap in the frequency domain. Finally, some simulation studies illustrate that the permutation tests usually behave better than the original tests if performance is measured by the type I and II errors, respectively
New resampling method for evaluating stability of clusters
<p>Abstract</p> <p>Background</p> <p>Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.</p> <p>We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.</p> <p>Results</p> <p>Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.</p> <p>Conclusion</p> <p>We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.</p
Particle Efficient Importance Sampling
The efficient importance sampling (EIS) method is a general principle for the
numerical evaluation of high-dimensional integrals that uses the sequential
structure of target integrands to build variance minimising importance
samplers. Despite a number of successful applications in high dimensions, it is
well known that importance sampling strategies are subject to an exponential
growth in variance as the dimension of the integration increases. We solve this
problem by recognising that the EIS framework has an offline sequential Monte
Carlo interpretation. The particle EIS method is based on non-standard
resampling weights that take into account the look-ahead construction of the
importance sampler. We apply the method for a range of univariate and bivariate
stochastic volatility specifications. We also develop a new application of the
EIS approach to state space models with Student's t state innovations. Our
results show that the particle EIS method strongly outperforms both the
standard EIS method and particle filters for likelihood evaluation in high
dimensions. Moreover, the ratio between the variances of the particle EIS and
particle filter methods remains stable as the time series dimension increases.
We illustrate the efficiency of the method for Bayesian inference using the
particle marginal Metropolis-Hastings and importance sampling squared
algorithms
- …