2,498 research outputs found
New important developments in small area estimation
The purpose of this paper is to review and discuss some of the new important developments in small area estimation (SAE) methods. Rao (2003) wrote a very comprehensive book, which covers all the main developments in this topic until that time and so the focus of this review is on new developments in the last 7 years. However, to make the review more self contained, I also repeat shortly some of the older developments. The review covers both design based and model-dependent methods with emphasis on the prediction of the area target quantities and the assessment of the prediction error. The style of the paper is similar to the style of my previous review on SAE published in 2002, explaining the new problems investigated and describing the proposed solutions, but without dwelling on theoretical details, which can be found in the original articles. I am hoping that this paper will be useful both to researchers who like to learn more on the research carried out in SAE and to practitioners who might be interested in the application of the new methods
ROC-Based Model Estimation for Forecasting Large Changes in Demand
Forecasting for large changes in demand should benefit from different estimation than that used for estimating mean behavior. We develop a multivariate forecasting model designed for detecting the largest changes across many time series. The model is fit based upon a penalty function that maximizes true positive rates along a relevant false positive rate range and can be used by managers wishing to take action on a small percentage of products likely to change the most in the next time period. We apply the model to a crime dataset and compare results to OLS as the basis for comparisons as well as models that are promising for exceptional demand forecasting such as quantile regression, synthetic data from a Bayesian model, and a power loss model. Using the Partial Area Under the Curve (PAUC) metric, our results show statistical significance, a 35 percent improvement over OLS, and at least a 20 percent improvement over competing methods. We suggest management with an increasing number of products to use our method for forecasting large changes in conjunction with typical magnitude-based methods for forecasting expected demand
B-tests: Low Variance Kernel Two-Sample Tests
A family of maximum mean discrepancy (MMD) kernel two-sample tests is
introduced. Members of the test family are called Block-tests or B-tests, since
the test statistic is an average over MMDs computed on subsets of the samples.
The choice of block size allows control over the tradeoff between test power
and computation time. In this respect, the -test family combines favorable
properties of previously proposed MMD two-sample tests: B-tests are more
powerful than a linear time test where blocks are just pairs of samples, yet
they are more computationally efficient than a quadratic time test where a
single large block incorporating all the samples is used to compute a
U-statistic. A further important advantage of the B-tests is their
asymptotically Normal null distribution: this is by contrast with the
U-statistic, which is degenerate under the null hypothesis, and for which
estimates of the null distribution are computationally demanding. Recent
results on kernel selection for hypothesis testing transfer seamlessly to the
B-tests, yielding a means to optimize test power via kernel choice.Comment: Neural Information Processing Systems (2013
Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data
We propose a resampling-based fast variable selection technique for selecting
important Single Nucleotide Polymorphisms (SNP) in multi-marker mixed effect
models used in twin studies. Due to computational complexity, current practice
includes testing the effect of one SNP at a time, commonly termed as `single
SNP association analysis'. Joint modeling of genetic variants within a gene or
pathway may have better power to detect the relevant genetic variants, hence we
adapt our recently proposed framework of -values to address this. In this
paper, we propose a computationally efficient approach for single SNP detection
in families while utilizing information on multiple SNPs simultaneously. We
achieve this through improvements in two aspects. First, unlike other model
selection techniques, our method only requires training a model with all
possible predictors. Second, we utilize a fast and scalable bootstrap procedure
that only requires Monte-Carlo sampling to obtain bootstrapped copies of the
estimated vector of coefficients. Using this bootstrap sample, we obtain the
-value for each SNP, and select SNPs having -values below a threshold. We
illustrate through numerical studies that our method is more effective in
detecting SNPs associated with a trait than either single-marker analysis using
family data or model selection methods that ignore the familial dependency
structure. We also use the -values to perform gene-level analysis in nuclear
families and detect several SNPs that have been implicated to be associated
with alcohol consumption
A One-Sample Test for Normality with Kernel Methods
We propose a new one-sample test for normality in a Reproducing Kernel
Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a
given family of Gaussian distributions. Hence our procedure may be applied
either to test data for normality or to test parameters (mean and covariance)
if data are assumed Gaussian. Our test is based on the same principle as the
MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such
as homogeneity or independence testing. Our method makes use of a special kind
of parametric bootstrap (typical of goodness-of-fit tests) which is
computationally more efficient than standard parametric bootstrap. Moreover, an
upper bound for the Type-II error highlights the dependence on influential
quantities. Experiments illustrate the practical improvement allowed by our
test in high-dimensional settings where common normality tests are known to
fail. We also consider an application to covariance rank selection through a
sequential procedure
Partial Linear Quantile Regression and Bootstrap Confidence Bands
In this paper uniform confidence bands are constructed for nonparametric quantile estimates of regression functions. The method is based on the bootstrap, where resampling is done from a suitably estimated empirical density function (edf) for residuals. It is known that the approximation error for the uniform confidence band by the asymptotic Gumbel distribution is logarithmically slow. It is proved that the bootstrap approximation provides a substantial improvement. The case of multidimensional and discrete regressor variables is dealt with using a partial linear model. Comparison to classic asymptotic uniform bands is presented through a simulation study. An economic application considers the labour market differential effect with respect to different education levels.Bootstrap, Quantile Regression, Confidence Bands, Nonparametric Fitting, Kernel Smoothing, Partial Linear Model
- …