1,540 research outputs found
Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
We would like to take this opportunity to thank the discussants for their
thoughtful comments and encouragements on our work [arXiv:0808.1012]. The
discussants raised a number of issues from theoretical as well as computational
perspectives. Our rejoinder will try to provide some insights into these issues
and address specific questions asked by the discussants.Comment: Published in at http://dx.doi.org/10.1214/07-AOS0316REJ the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Monitoring Networked Applications With Incremental Quantile Estimation
Networked applications have software components that reside on different
computers. Email, for example, has database, processing, and user interface
components that can be distributed across a network and shared by users in
different locations or work groups. End-to-end performance and reliability
metrics describe the software quality experienced by these groups of users,
taking into account all the software components in the pipeline. Each user
produces only some of the data needed to understand the quality of the
application for the group, so group performance metrics are obtained by
combining summary statistics that each end computer periodically (and
automatically) sends to a central server. The group quality metrics usually
focus on medians and tail quantiles rather than on averages. Distributed
quantile estimation is challenging, though, especially when passing large
amounts of data around the network solely to compute quality metrics is
undesirable. This paper describes an Incremental Quantile (IQ) estimation
method that is designed for performance monitoring at arbitrary levels of
network aggregation and time resolution when only a limited amount of data can
be transferred. Applications to both real and simulated data are provided.Comment: This paper commented in: [arXiv:0708.0317], [arXiv:0708.0336],
[arXiv:0708.0338]. Rejoinder in [arXiv:0708.0339]. Published at
http://dx.doi.org/10.1214/088342306000000583 in the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Recommended from our members
Machine Learning Framework to Identify Individuals at Risk of Rapid Progression of Coronary Atherosclerosis: From the PARADIGM Registry.
Background Rapid coronary plaque progression (RPP) is associated with incident cardiovascular events. To date, no method exists for the identification of individuals at risk of RPP at a single point in time. This study integrated coronary computed tomography angiography-determined qualitative and quantitative plaque features within a machine learning (ML) framework to determine its performance for predicting RPP. Methods and Results Qualitative and quantitative coronary computed tomography angiography plaque characterization was performed in 1083 patients who underwent serial coronary computed tomography angiography from the PARADIGM (Progression of Atherosclerotic Plaque Determined by Computed Tomographic Angiography Imaging) registry. RPP was defined as an annual progression of percentage atheroma volume ≥1.0%. We employed the following ML models: model 1, clinical variables; model 2, model 1 plus qualitative plaque features; model 3, model 2 plus quantitative plaque features. ML models were compared with the atherosclerotic cardiovascular disease risk score, Duke coronary artery disease score, and a logistic regression statistical model. 224 patients (21%) were identified as RPP. Feature selection in ML identifies that quantitative computed tomography variables were higher-ranking features, followed by qualitative computed tomography variables and clinical/laboratory variables. ML model 3 exhibited the highest discriminatory performance to identify individuals who would experience RPP when compared with atherosclerotic cardiovascular disease risk score, the other ML models, and the statistical model (area under the receiver operating characteristic curve in ML model 3, 0.83 [95% CI 0.78-0.89], versus atherosclerotic cardiovascular disease risk score, 0.60 [0.52-0.67]; Duke coronary artery disease score, 0.74 [0.68-0.79]; ML model 1, 0.62 [0.55-0.69]; ML model 2, 0.73 [0.67-0.80]; all P<0.001; statistical model, 0.81 [0.75-0.87], P=0.128). Conclusions Based on a ML framework, quantitative atherosclerosis characterization has been shown to be the most important feature when compared with clinical, laboratory, and qualitative measures in identifying patients at risk of RPP
High-dimensional regression adjustments in randomized experiments
We study the problem of treatment effect estimation in randomized experiments
with high-dimensional covariate information, and show that essentially any
risk-consistent regression adjustment can be used to obtain efficient estimates
of the average treatment effect. Our results considerably extend the range of
settings where high-dimensional regression adjustments are guaranteed to
provide valid inference about the population average treatment effect. We then
propose cross-estimation, a simple method for obtaining finite-sample-unbiased
treatment effect estimates that leverages high-dimensional regression
adjustments. Our method can be used when the regression model is estimated
using the lasso, the elastic net, subset selection, etc. Finally, we extend our
analysis to allow for adaptive specification search via cross-validation, and
flexible non-parametric regression adjustments with machine learning methods
such as random forests or neural networks.Comment: To appear in the Proceedings of the National Academy of Sciences. The
present draft does not reflect final copyediting by the PNAS staf
Variable selection using MM algorithms
Variable selection is fundamental to high-dimensional statistical modeling.
Many variable selection techniques may be implemented by maximum penalized
likelihood using various penalty functions. Optimizing the penalized likelihood
function is often challenging because it may be nondifferentiable and/or
nonconcave. This article proposes a new class of algorithms for finding a
maximizer of the penalized likelihood for a broad class of penalty functions.
These algorithms operate by perturbing the penalty function slightly to render
it differentiable, then optimizing this differentiable function using a
minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the
well-known class of EM algorithms, a fact that allows us to analyze the local
and global convergence of the proposed algorithm using some of the techniques
employed for EM algorithms. In particular, we prove that when our MM algorithms
converge, they must converge to a desirable point; we also discuss conditions
under which this convergence may be guaranteed. We exploit the
Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator
for the standard errors of the estimators. Our method performs well in
numerical tests.Comment: Published at http://dx.doi.org/10.1214/009053605000000200 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …