Search CORE

35,266 research outputs found

Futility Analysis in the Cross-Validation of Machine Learning Models

Author: Kuhn Max
Publication venue
Publication date: 27/05/2014
Field of study

Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.Comment: 22 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

minque: An R Package for Analyzing Various Linear Mixed Models

Author: Wu Jixiang
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 05/02/2019
Field of study

Linear mixed model (LMM) approaches offer much more flexibility comparing ANOVA (analysis of variance) based methods. There are three commonly used LMM approaches: maximum likelihood, restricted maximum likelihood, and minimum norm quadratic unbiased estimation. These three approaches, however, sometimes could also lead low testing power compared to ANOVA methods. Integration of resampling techniques like jackknife could help improve testing power based on both our simulation studies. In this presentation, I will introduce a R package, minque, which integrates LMM approaches and resampling techniques and demonstrate the use of this packages in various linear mixed model analyses

Public Research Access Institutional Repository and Information Exchange

What can the Real World do for simulation studies? A comparison of exploratory methods

Author: Bollmann Stella
Bühner Markus
Heene Moritz
Küchenhoff Helmut
Publication venue
Publication date: 14/04/2015
Field of study

For simulation studies on the exploratory factor analysis (EFA), usually rather simple population models are used without model errors. In the present study, real data characteristics are used for Monte Carlo simulation studies. Real large data sets are examined and the results of EFA on them are taken as the population models. First we apply a resampling technique on these data sets with sub samples of different sizes. Then, a Monte Carlo study is conducted based on the parameters of the population model and with some variations of them. Two data sets are analyzed as an illustration. Results suggest that outcomes of simulation studies are always highly influenced by particular specification of the model and its violations. Once small residual correlations appeared in the data for example, the ranking of our methods changed completely. The analysis of real data set characteristics is therefore important to understand the performance of different methods

Open Access LMU

Resampling Methods for the Change Analysis of Dependent Data

Author: Kirch Claudia
Publication venue
Publication date: 01/01/2006
Field of study

The fundamental question in change-point analysis is whether an observed stochastic process follows one model or whether the underlying model changes at least once during the observational period. Most of the older works discuss independent observations, yet from a practical point of view cases of dependent data have become more and more important. In this dissertation we develop testing procedures for dependent models. In change-point analysis critical values for testing procedures are usually obtained by distributional asymptotics. These critical values, however, do not sufficiently reflect dependency. Moreover it is a well-known fact that convergence rates especially for extreme-value statistics are very slow. Using resampling methods we obtain better approximations, which take possible dependency structures more efficiently into account. We prove that the original statistics and their resampling counterparts follow the same distributional asymptotics. First we obtain limit theorems for the corresponding rank statistics, which then combined with laws of large numbers imply the resampling asymptotics conditionally on the given data. In a first part we consider abrupt and gradual changes in models of possibly dependent observations satisfying a strong invariance principle. The main part of this dissertation studies a location model with dependent errors that form a linear process. Different types of statistics are considered, such as maximum-type statistics (particularly different CUSUM procedures) or sum-type statistics. The resampling-methods have to be adapted to allow for dependent errors. Thus, we analyze a block bootstrap as well as a bootstrap in the frequency domain. Finally, some simulation studies illustrate that the permutation tests usually behave better than the original tests if performance is measured by the type I and II errors, respectively

Kölner UniversitätsPublikationsServer

New resampling method for evaluating stability of clusters

Author: A Bhattacharjee
A Thalamuthu
B Efron
F Tschentscher
GC Tseng
H Pruscha
H Schneider
Irina M Gana Dresen
J Handl
J Quackenbush
JC Gower
JH Ward
Johannes Huesing
K Zhang
Karl-Heinz Joeckel
L Hubert
LM McShane
M Bittner
M Smolkin
Markus Neuhaeuser
MB Eisen
MK Kerr
PHA Sneath
RR Sokal
S Datta
S Datta
S Datta
S Dudoit
S Monti
T Margush
T Sørensen
Tanja Boes
WM Rand
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. Results Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. Conclusion We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Particle Efficient Importance Sampling

Author: Kohn Robert
Scharth Marcel
Publication venue
Publication date: 26/09/2013
Field of study

The efficient importance sampling (EIS) method is a general principle for the numerical evaluation of high-dimensional integrals that uses the sequential structure of target integrands to build variance minimising importance samplers. Despite a number of successful applications in high dimensions, it is well known that importance sampling strategies are subject to an exponential growth in variance as the dimension of the integration increases. We solve this problem by recognising that the EIS framework has an offline sequential Monte Carlo interpretation. The particle EIS method is based on non-standard resampling weights that take into account the look-ahead construction of the importance sampler. We apply the method for a range of univariate and bivariate stochastic volatility specifications. We also develop a new application of the EIS approach to state space models with Student's t state innovations. Our results show that the particle EIS method strongly outperforms both the standard EIS method and particle filters for likelihood evaluation in high dimensions. Moreover, the ratio between the variances of the particle EIS and particle filter methods remains stable as the time series dimension increases. We illustrate the efficiency of the method for Bayesian inference using the particle marginal Metropolis-Hastings and importance sampling squared algorithms

arXiv.org e-Print Archive

CiteSeerX