9,081 research outputs found
On Degrees of Freedom of Projection Estimators with Applications to Multivariate Nonparametric Regression
In this paper, we consider the nonparametric regression problem with
multivariate predictors. We provide a characterization of the degrees of
freedom and divergence for estimators of the unknown regression function, which
are obtained as outputs of linearly constrained quadratic optimization
procedures, namely, minimizers of the least squares criterion with linear
constraints and/or quadratic penalties. As special cases of our results, we
derive explicit expressions for the degrees of freedom in many nonparametric
regression problems, e.g., bounded isotonic regression, multivariate
(penalized) convex regression, and additive total variation regularization. Our
theory also yields, as special cases, known results on the degrees of freedom
of many well-studied estimators in the statistics literature, such as ridge
regression, Lasso and generalized Lasso. Our results can be readily used to
choose the tuning parameter(s) involved in the estimation procedure by
minimizing the Stein's unbiased risk estimate. As a by-product of our analysis
we derive an interesting connection between bounded isotonic regression and
isotonic regression on a general partially ordered set, which is of independent
interest.Comment: 72 pages, 7 figures, Journal of the American Statistical Association
(Theory and Methods), 201
The asymptotic distribution of the isotonic regression estimator over a general countable pre-ordered set
We study the isotonic regression estimator over a general countable
pre-ordered set. We obtain the limiting distribution of the estimator and study
its properties. It is proved that, under some general assumptions, the limiting
distribution of the isotonized estimator is given by the concatenation of the
separate isotonic regressions of the certain subvectors of an unrestrecred
estimator's asymptotic distribution. Also, we show that the isotonization
preserves the rate of convergence of the underlying estimator. We apply these
results to the problems of estimation of a bimonotone regression function and
estimation of a bimonotone probability mass function
A dynamic programming approach for generalized nearly isotonic optimization
Shape restricted statistical estimation problems have been extensively
studied, with many important practical applications in signal processing,
bioinformatics, and machine learning. In this paper, we propose and study a
generalized nearly isotonic optimization (GNIO) model, which recovers, as
special cases, many classic problems in shape constrained statistical
regression, such as isotonic regression, nearly isotonic regression and
unimodal regression problems. We develop an efficient and easy-to-implement
dynamic programming algorithm for solving the proposed model whose recursion
nature is carefully uncovered and exploited. For special -GNIO
problems, implementation details and the optimal running time
analysis of our algorithm are discussed. Numerical experiments, including the
comparison between our approach and the powerful commercial solver Gurobi for
solving -GNIO and -GNIO problems, on both simulated and real
data sets are presented to demonstrate the high efficiency and robustness of
our proposed algorithm in solving large scale GNIO problems
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models
Learning accurate probabilistic models from data is crucial in many practical
tasks in data mining. In this paper we present a new non-parametric calibration
method called \textit{ensemble of near isotonic regression} (ENIR). The method
can be considered as an extension of BBQ, a recently proposed calibration
method, as well as the commonly used calibration method based on isotonic
regression. ENIR is designed to address the key limitation of isotonic
regression which is the monotonicity assumption of the predictions. Similar to
BBQ, the method post-processes the output of a binary classifier to obtain
calibrated probabilities. Thus it can be combined with many existing
classification models. We demonstrate the performance of ENIR on synthetic and
real datasets for the commonly used binary classification models. Experimental
results show that the method outperforms several common binary classifier
calibration methods. In particular on the real data, ENIR commonly performs
statistically significantly better than the other methods, and never worse. It
is able to improve the calibration power of classifiers, while retaining their
discrimination power. The method is also computationally tractable for large
scale datasets, as it is time, where is the number of
samples
- β¦