12,112 research outputs found
Model selection in High-Dimensions: A Quadratic-risk based approach
In this article we propose a general class of risk measures which can be used
for data based evaluation of parametric models. The loss function is defined as
generalized quadratic distance between the true density and the proposed model.
These distances are characterized by a simple quadratic form structure that is
adaptable through the choice of a nonnegative definite kernel and a bandwidth
parameter. Using asymptotic results for the quadratic distances we build a
quick-to-compute approximation for the risk function. Its derivation is
analogous to the Akaike Information Criterion (AIC), but unlike AIC, the
quadratic risk is a global comparison tool. The method does not require
resampling, a great advantage when point estimators are expensive to compute.
The method is illustrated using the problem of selecting the number of
components in a mixture model, where it is shown that, by using an appropriate
kernel, the method is computationally straightforward in arbitrarily high data
dimensions. In this same context it is shown that the method has some clear
advantages over AIC and BIC.Comment: Updated with reviewer suggestion
Building Combined Classifiers
This chapter covers different approaches that may be taken when building an
ensemble method, through studying specific examples of each approach from research
conducted by the authors. A method called Negative Correlation Learning illustrates a
decision level combination approach with individual classifiers trained co-operatively. The
Model level combination paradigm is illustrated via a tree combination method. Finally,
another variant of the decision level paradigm, with individuals trained independently
instead of co-operatively, is discussed as applied to churn prediction in the
telecommunications industry
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
The bias-variance tradeoff tells us that as model complexity increases, bias
falls and variances increases, leading to a U-shaped test error curve. However,
recent empirical results with over-parameterized neural networks are marked by
a striking absence of the classic U-shaped test error curve: test error keeps
decreasing in wider networks. This suggests that there might not be a
bias-variance tradeoff in neural networks with respect to network width, unlike
was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky
evidence used to support this claim in neural networks, we measure bias and
variance in the modern setting. We find that both bias and variance can
decrease as the number of parameters grows. To better understand this, we
introduce a new decomposition of the variance to disentangle the effects of
optimization and data sampling. We also provide theoretical analysis in a
simplified setting that is consistent with our empirical findings
Small Area Shrinkage Estimation
The need for small area estimates is increasingly felt in both the public and
private sectors in order to formulate their strategic plans. It is now widely
recognized that direct small area survey estimates are highly unreliable owing
to large standard errors and coefficients of variation. The reason behind this
is that a survey is usually designed to achieve a specified level of accuracy
at a higher level of geography than that of small areas. Lack of additional
resources makes it almost imperative to use the same data to produce small area
estimates. For example, if a survey is designed to estimate per capita income
for a state, the same survey data need to be used to produce similar estimates
for counties, subcounties and census divisions within that state. Thus, by
necessity, small area estimation needs explicit, or at least implicit, use of
models to link these areas. Improved small area estimates are found by
"borrowing strength" from similar neighboring areas.Comment: Published in at http://dx.doi.org/10.1214/11-STS374 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Desiderata for a Predictive Theory of Statistics
In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six \u27desiderata\u27. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy
- …