390 research outputs found
A Generic Path Algorithm for Regularized Statistical Estimation
Regularization is widely used in statistics and machine learning to prevent
overfitting and gear solution towards prior information. In general, a
regularized estimation problem minimizes the sum of a loss function and a
penalty term. The penalty term is usually weighted by a tuning parameter and
encourages certain constraints on the parameters to be estimated. Particular
choices of constraints lead to the popular lasso, fused-lasso, and other
generalized penalized regression methods. Although there has been a lot
of research in this area, developing efficient optimization methods for many
nonseparable penalties remains a challenge. In this article we propose an exact
path solver based on ordinary differential equations (EPSODE) that works for
any convex loss function and can deal with generalized penalties as well
as more complicated regularization such as inequality constraints encountered
in shape-restricted regressions and nonparametric density estimation. In the
path following process, the solution path hits, exits, and slides along the
various constraints and vividly illustrates the tradeoffs between goodness of
fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC,
or cross-validation to select an optimal tuning parameter. Our
applications to generalized regularized generalized linear models,
shape-restricted regressions, Gaussian graphical models, and nonparametric
density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure
Computational Protein Design Using AND/OR Branch-and-Bound Search
The computation of the global minimum energy conformation (GMEC) is an
important and challenging topic in structure-based computational protein
design. In this paper, we propose a new protein design algorithm based on the
AND/OR branch-and-bound (AOBB) search, which is a variant of the traditional
branch-and-bound search algorithm, to solve this combinatorial optimization
problem. By integrating with a powerful heuristic function, AOBB is able to
fully exploit the graph structure of the underlying residue interaction network
of a backbone template to significantly accelerate the design process. Tests on
real protein data show that our new protein design algorithm is able to solve
many prob- lems that were previously unsolvable by the traditional exact search
algorithms, and for the problems that can be solved with traditional provable
algorithms, our new method can provide a large speedup by several orders of
magnitude while still guaranteeing to find the global minimum energy
conformation (GMEC) solution.Comment: RECOMB 201
Nonparametric Independence Screening via Favored Smoothing Bandwidth
We propose a flexible nonparametric regression method for
ultrahigh-dimensional data. As a first step, we propose a fast screening method
based on the favored smoothing bandwidth of the marginal local constant
regression. Then, an iterative procedure is developed to recover both the
important covariates and the regression function. Theoretically, we prove that
the favored smoothing bandwidth based screening possesses the model selection
consistency property. Simulation studies as well as real data analysis show the
competitive performance of the new procedure.Comment: 22 page
Varying-coefficient functional linear regression
Functional linear regression analysis aims to model regression relations
which include a functional predictor. The analog of the regression parameter
vector or matrix in conventional multivariate or multiple-response linear
regression models is a regression parameter function in one or two arguments.
If, in addition, one has scalar predictors, as is often the case in
applications to longitudinal studies, the question arises how to incorporate
these into a functional regression model. We study a varying-coefficient
approach where the scalar covariates are modeled as additional arguments of the
regression parameter function. This extension of the functional linear
regression model is analogous to the extension of conventional linear
regression models to varying-coefficient models and shares its advantages, such
as increased flexibility; however, the details of this extension are more
challenging in the functional case. Our methodology combines smoothing methods
with regularization by truncation at a finite number of functional principal
components. A practical version is developed and is shown to perform better
than functional linear regression for longitudinal data. We investigate the
asymptotic properties of varying-coefficient functional linear regression and
establish consistency properties.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ231 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
A backward procedure for change-point detection with applications to copy number variation detection
Change-point detection regains much attention recently for analyzing array or
sequencing data for copy number variation (CNV) detection. In such
applications, the true signals are typically very short and buried in the long
data sequence, which makes it challenging to identify the variations
efficiently and accurately. In this article, we propose a new change-point
detection method, a backward procedure, which is not only fast and simple
enough to exploit high-dimensional data but also performs very well for
detecting short signals. Although motivated by CNV detection, the backward
procedure is generally applicable to assorted change-point problems that arise
in a variety of scientific applications. It is illustrated by both simulated
and real CNV data that the backward detection has clear advantages over other
competing methods especially when the true signal is short
Marginal empirical likelihood and sure independence feature screening
We study a marginal empirical likelihood approach in scenarios when the
number of variables grows exponentially with the sample size. The marginal
empirical likelihood ratios as functions of the parameters of interest are
systematically examined, and we find that the marginal empirical likelihood
ratio evaluated at zero can be used to differentiate whether an explanatory
variable is contributing to a response variable or not. Based on this finding,
we propose a unified feature screening procedure for linear models and the
generalized linear models. Different from most existing feature screening
approaches that rely on the magnitudes of some marginal estimators to identify
true signals, the proposed screening approach is capable of further
incorporating the level of uncertainties of such estimators. Such a merit
inherits the self-studentization property of the empirical likelihood approach,
and extends the insights of existing feature screening methods. Moreover, we
show that our screening approach is less restrictive to distributional
assumptions, and can be conveniently adapted to be applied in a broad range of
scenarios such as models specified using general moment conditions. Our
theoretical results and extensive numerical examples by simulations and data
analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …