99,323 research outputs found
Replica analysis of overfitting in regression models for time-to-event data
Overfitting, which happens when the number of parameters in a model is too
large compared to the number of data points available for determining these
parameters, is a serious and growing problem in survival analysis. While modern
medicine presents us with data of unprecedented dimensionality, these data
cannot yet be used effectively for clinical outcome prediction. Standard error
measures in maximum likelihood regression, such as p-values and z-scores, are
blind to overfitting, and even for Cox's proportional hazards model (the main
tool of medical statisticians), one finds in literature only rules of thumb on
the number of samples required to avoid overfitting. In this paper we present a
mathematical theory of overfitting in regression models for time-to-event data,
which aims to increase our quantitative understanding of the problem and
provide practical tools with which to correct regression outcomes for the
impact of overfitting. It is based on the replica method, a statistical
mechanical technique for the analysis of heterogeneous many-variable systems
that has been used successfully for several decades in physics, biology, and
computer science, but not yet in medical statistics. We develop the theory
initially for arbitrary regression models for time-to-event data, and verify
its predictions in detail for the popular Cox model.Comment: 37 pages, 9 figure
Low-Rank Discriminative Least Squares Regression for Image Classification
Latest least squares regression (LSR) methods mainly try to learn slack
regression targets to replace strict zero-one labels. However, the difference
of intra-class targets can also be highlighted when enlarging the distance
between different classes, and roughly persuing relaxed targets may lead to the
problem of overfitting. To solve above problems, we propose a low-rank
discriminative least squares regression model (LRDLSR) for multi-class image
classification. Specifically, LRDLSR class-wisely imposes low-rank constraint
on the intra-class regression targets to encourage its compactness and
similarity. Moreover, LRDLSR introduces an additional regularization term on
the learned targets to avoid the problem of overfitting. These two improvements
are helpful to learn a more discriminative projection for regression and thus
achieving better classification performance. Experimental results over a range
of image databases demonstrate the effectiveness of the proposed LRDLSR method
- …