3,101 research outputs found
Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction
Longitudinal analysis is important in many disciplines, such as the study of
behavioral transitions in social science. Only very recently, feature selection
has drawn adequate attention in the context of longitudinal modeling. Standard
techniques, such as generalized estimating equations, have been modified to
select features by imposing sparsity-inducing regularizers. However, they do
not explicitly model how a dependent variable relies on features measured at
proximal time points. Recent graphical Granger modeling can select features in
lagged time points but ignores the temporal correlations within an individual's
repeated measurements. We propose an approach to automatically and
simultaneously determine both the relevant features and the relevant temporal
points that impact the current outcome of the dependent variable. Meanwhile,
the proposed model takes into account the non-{\em i.i.d} nature of the data by
estimating the within-individual correlations. This approach decomposes model
parameters into a summation of two components and imposes separate block-wise
LASSO penalties to each component when building a linear model in terms of the
past measurements of features. One component is used to select features
whereas the other is used to select temporal contingent points. An accelerated
gradient descent algorithm is developed to efficiently solve the related
optimization problem with detailed convergence analysis and asymptotic
analysis. Computational results on both synthetic and real world problems
demonstrate the superior performance of the proposed approach over existing
techniques.Comment: Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 201
An Aggregation Method for Sparse Logistic Regression
regularized logistic regression has now become a workhorse of data
mining and bioinformatics: it is widely used for many classification problems,
particularly ones with many features. However, regularization typically
selects too many features and that so-called false positives are unavoidable.
In this paper, we demonstrate and analyze an aggregation method for sparse
logistic regression in high dimensions. This approach linearly combines the
estimators from a suitable set of logistic models with different underlying
sparsity patterns and can balance the predictive ability and model
interpretability. Numerical performance of our proposed aggregation method is
then investigated using simulation studies. We also analyze a published
genome-wide case-control dataset to further evaluate the usefulness of the
aggregation method in multilocus association mapping
Market Structure and Entry: Where's the Beef?.
We study the effects of market structure on entry using data from the UK fast food (counter-service burger)industry over the years 1991-1995. Over this period, the market can be characterized as a duopoly. We find that market structure matters greatly: for both firms, rival presence increases the probability of entry. We control for market specific time-invariant unobservable and their correlation with existing outlets of both firms through a variety of methods.LEARNING ; MARKET ; DUOPOLY
Oracle Properties and Finite Sample Inference of the Adaptive Lasso for Time Series Regression Models
We derive new theoretical results on the properties of the adaptive least
absolute shrinkage and selection operator (adaptive lasso) for time series
regression models. In particular, we investigate the question of how to conduct
finite sample inference on the parameters given an adaptive lasso model for
some fixed value of the shrinkage parameter. Central in this study is the test
of the hypothesis that a given adaptive lasso parameter equals zero, which
therefore tests for a false positive. To this end we construct a simple testing
procedure and show, theoretically and empirically through extensive Monte Carlo
simulations, that the adaptive lasso combines efficient parameter estimation,
variable selection, and valid finite sample inference in one step. Moreover, we
analytically derive a bias correction factor that is able to significantly
improve the empirical coverage of the test on the active variables. Finally, we
apply the introduced testing procedure to investigate the relation between the
short rate dynamics and the economy, thereby providing a statistical foundation
(from a model choice perspective) to the classic Taylor rule monetary policy
model
Technical Report: Compressive Temporal Higher Order Cyclostationary Statistics
The application of nonlinear transformations to a cyclostationary signal for
the purpose of revealing hidden periodicities has proven to be useful for
applications requiring signal selectivity and noise tolerance. The fact that
the hidden periodicities, referred to as cyclic moments, are often compressible
in the Fourier domain motivates the use of compressive sensing (CS) as an
efficient acquisition protocol for capturing such signals. In this work, we
consider the class of Temporal Higher Order Cyclostationary Statistics (THOCS)
estimators when CS is used to acquire the cyclostationary signal assuming
compressible cyclic moments in the Fourier domain. We develop a theoretical
framework for estimating THOCS using the low-rate nonuniform sampling protocol
from CS and illustrate the performance of this framework using simulated data
Quantile calculus and censored regression
Quantile regression has been advocated in survival analysis to assess
evolving covariate effects. However, challenges arise when the censoring time
is not always observed and may be covariate-dependent, particularly in the
presence of continuously-distributed covariates. In spite of several recent
advances, existing methods either involve algorithmic complications or impose a
probability grid. The former leads to difficulties in the implementation and
asymptotics, whereas the latter introduces undesirable grid dependence. To
resolve these issues, we develop fundamental and general quantile calculus on
cumulative probability scale in this article, upon recognizing that probability
and time scales do not always have a one-to-one mapping given a survival
distribution. These results give rise to a novel estimation procedure for
censored quantile regression, based on estimating integral equations. A
numerically reliable and efficient Progressive Localized Minimization (PLMIN)
algorithm is proposed for the computation. This procedure reduces exactly to
the Kaplan--Meier method in the -sample problem, and to standard uncensored
quantile regression in the absence of censoring. Under regularity conditions,
the proposed quantile coefficient estimator is uniformly consistent and
converges weakly to a Gaussian process. Simulations show good statistical and
algorithmic performance. The proposal is illustrated in the application to a
clinical study.Comment: Published in at http://dx.doi.org/10.1214/09-AOS771 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …