35,065 research outputs found
Estimating the prediction performance of spatial models via spatial k-fold cross validation
In machine learning, one often assumes the data are independent when evaluating model performance. However, this rarely holds in practice. Geographic information datasets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates for spatial models, which can result in increased costs and accidents in practical applications. To overcome this problem, we propose a modified version of the CV method called spatial k-fold cross validation (SKCV), which provides a useful estimate for model prediction performance without optimistic bias due to SAC. We test SKCV with three real-world cases involving open natural data showing that the estimates produced by the ordinary CV are up to 40% more optimistic than those of SKCV. Both regression and classification cases are considered in our experiments. In addition, we will show how the SKCV method can be applied as a criterion for selecting data sampling density for new research area
Regularized brain reading with shrinkage and smoothing
Functional neuroimaging measures how the brain responds to complex stimuli.
However, sample sizes are modest, noise is substantial, and stimuli are high
dimensional. Hence, direct estimates are inherently imprecise and call for
regularization. We compare a suite of approaches which regularize via
shrinkage: ridge regression, the elastic net (a generalization of ridge
regression and the lasso), and a hierarchical Bayesian model based on small
area estimation (SAE). We contrast regularization with spatial smoothing and
combinations of smoothing and shrinkage. All methods are tested on functional
magnetic resonance imaging (fMRI) data from multiple subjects participating in
two different experiments related to reading, for both predicting neural
response to stimuli and decoding stimuli from responses. Interestingly, when
the regularization parameters are chosen by cross-validation independently for
every voxel, low/high regularization is chosen in voxels where the
classification accuracy is high/low, indicating that the regularization
intensity is a good tool for identification of relevant voxels for the
cognitive task. Surprisingly, all the regularization methods work about equally
well, suggesting that beating basic smoothing and shrinkage will take not only
clever methods, but also careful modeling.Comment: Published at http://dx.doi.org/10.1214/15-AOAS837 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …