14,161 research outputs found
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Technical Note: The impact of spatial scale in bias correction of climate model output for hydrologic impact studies
Statistical downscaling is a commonly used technique for translating large-scale climate model output to a scale appropriate for assessing impacts. To ensure downscaled meteorology can be used in climate impact studies, downscaling must correct biases in the large-scale signal. A simple and generally effective method for accommodating systematic biases in large-scale model output is quantile mapping, which has been applied to many variables and shown to reduce biases on average, even in the presence of non-stationarity. Quantile-mapping bias correction has been applied at spatial scales ranging from hundreds of kilometers to individual points, such as weather station locations. Since water resources and other models used to simulate climate impacts are sensitive to biases in input meteorology, there is a motivation to apply bias correction at a scale fine enough that the downscaled data closely resemble historically observed data, though past work has identified undesirable consequences to applying quantile mapping at too fine a scale. This study explores the role of the spatial scale at which the quantile-mapping bias correction is applied, in the context of estimating high and low daily streamflows across the western United States. We vary the spatial scale at which quantile-mapping bias correction is performed from 2° ( ∼  200 km) to 1∕8° ( ∼  12 km) within a statistical downscaling procedure, and use the downscaled daily precipitation and temperature to drive a hydrology model. We find that little additional benefit is obtained, and some skill is degraded, when using quantile mapping at scales finer than approximately 0.5° ( ∼  50 km). This can provide guidance to those applying the quantile-mapping bias correction method for hydrologic impacts analysis
Sorting Between and Within Industries: A Testable Model of Assortative Matching
We test Shimer\u27s (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting--more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated
- …