2,082 research outputs found
Early stopping and non-parametric regression: An optimal data-dependent stopping rule
The strategy of early stopping is a regularization technique based on
choosing a stopping time for an iterative algorithm. Focusing on non-parametric
regression in a reproducing kernel Hilbert space, we analyze the early stopping
strategy for a form of gradient-descent applied to the least-squares loss
function. We propose a data-dependent stopping rule that does not involve
hold-out or cross-validation data, and we prove upper bounds on the squared
error of the resulting function estimate, measured in either the and
norm. These upper bounds lead to minimax-optimal rates for various
kernel classes, including Sobolev smoothness classes and other forms of
reproducing kernel Hilbert spaces. We show through simulation that our stopping
rule compares favorably to two other stopping rules, one based on hold-out data
and the other based on Stein's unbiased risk estimate. We also establish a
tight connection between our early stopping strategy and the solution path of a
kernel ridge regression estimator.Comment: 29 pages, 4 figure
Kernel Logistic Regression-linear for Leukemia Classification Using High Dimensional Data
Kernel Logistic Regression (KLR) is one of the statistical models that has been proposed for classification in the machine learning and data mining communities, and also one of the effective methodologies in the kernel–machine techniques. Basely, KLR is kernelized version of linear Logistic Regression (LR). Unlike LR, KLR has ability to classify data with non linear boundary and also can accommodate data with very high dimensional and very few instances. In this research, we proposed to study the use of Linear Kernel on KLR in order to increase the accuracy of Leukemia Classification. Leukemia is one of the cancer types that causes mortality in medical diagnosis problem. Improving the accuracy of Leukemia Classification is essential for more effective diagnosis and treatment of Leukemia disease. The Leukemia data sets consists of 7120 (very high dimensional) DNA micro arrays data of 72 (very few instances) patient samples on the state of Leukemia types. In Leukemia classification based upon gene expression, monitoring data using DNA micro array offer hope to achieve an objective and highly accurate classification. It can be demonstrated that the use of Linear Kernel on Kernel Logistic Regression (KLR–Linear) can improve the performance in classifying Leukemia patient samples and also can be shown that KLR–Linear has better accuracy than KLR–Polynomial and Penalized Logistic Regression
- …