1,230 research outputs found
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Volterra and polynomial regression models play a major role in nonlinear
system identification and inference tasks. Exciting applications ranging from
neuroscience to genome-wide association analysis build on these models with the
additional requirement of parsimony. This requirement has high interpretative
value, but unfortunately cannot be met by least-squares based or kernel
regression methods. To this end, compressed sampling (CS) approaches, already
successful in linear regression settings, can offer a viable alternative. The
viability of CS for sparse Volterra and polynomial models is the core theme of
this work. A common sparse regression task is initially posed for the two
models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type
algorithm is developed for sparse polynomial regressions. The identifiability
of polynomial models is critically challenged by dimensionality. However,
following the CS principle, when these models are sparse, they could be
recovered by far fewer measurements. To quantify the sufficient number of
measurements for a given level of sparsity, restricted isometry properties
(RIP) are investigated in commonly met polynomial regression settings,
generalizing known results for their linear counterparts. The merits of the
novel (weighted) adaptive CS algorithms to sparse polynomial modeling are
verified through synthetic as well as real data tests for genotype-phenotype
analysis.Comment: 20 pages, to appear in IEEE Trans. on Signal Processin
Sparse Bilinear Logistic Regression
In this paper, we introduce the concept of sparse bilinear logistic
regression for decision problems involving explanatory variables that are
two-dimensional matrices. Such problems are common in computer vision,
brain-computer interfaces, style/content factorization, and parallel factor
analysis. The underlying optimization problem is bi-convex; we study its
solution and develop an efficient algorithm based on block coordinate descent.
We provide a theoretical guarantee for global convergence and estimate the
asymptotical convergence rate using the Kurdyka-{\L}ojasiewicz inequality. A
range of experiments with simulated and real data demonstrate that sparse
bilinear logistic regression outperforms current techniques in several
important applications.Comment: 27 pages, 5 figure
Learning Word Representations with Hierarchical Sparse Coding
We propose a new method for learning word representations using hierarchical
regularization in sparse coding inspired by the linguistic study of word
meanings. We show an efficient learning algorithm based on stochastic proximal
methods that is significantly faster than previous approaches, making it
possible to perform hierarchical sparse coding on a corpus of billions of word
tokens. Experiments on various benchmark tasks---word similarity ranking,
analogies, sentence completion, and sentiment analysis---demonstrate that the
method outperforms or is competitive with state-of-the-art methods. Our word
representations are available at
\url{http://www.ark.cs.cmu.edu/dyogatam/wordvecs/}
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods
Variable Screening for High Dimensional Time Series
Variable selection is a widely studied problem in high dimensional
statistics, primarily since estimating the precise relationship between the
covariates and the response is of great importance in many scientific
disciplines. However, most of theory and methods developed towards this goal
for the linear model invoke the assumption of iid sub-Gaussian covariates and
errors. This paper analyzes the theoretical properties of Sure Independence
Screening (SIS) (Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008)
849-911]) for high dimensional linear models with dependent and/or heavy tailed
covariates and errors. We also introduce a generalized least squares screening
(GLSS) procedure which utilizes the serial correlation present in the data. By
utilizing this serial correlation when estimating our marginal effects, GLSS is
shown to outperform SIS in many cases. For both procedures we prove sure
screening properties, which depend on the moment conditions, and the strength
of dependence in the error and covariate processes, amongst other factors.
Additionally, combining these screening procedures with the adaptive Lasso is
analyzed. Dependence is quantified by functional dependence measures (Wu [Proc.
Natl. Acad. Sci. USA 102 (2005) 14150-14154]), and the results rely on the use
of Nagaev-type and exponential inequalities for dependent random variables. We
also conduct simulations to demonstrate the finite sample performance of these
procedures, and include a real data application of forecasting the US inflation
rate.Comment: Published in the Electronic Journal of Statistics
(https://projecteuclid.org/euclid.ejs/1519700498
Experimental design trade-offs for gene regulatory network inference: an in silico study of the yeast Saccharomyces cerevisiae cell cycle
Time-series of high throughput gene sequencing data intended for gene
regulatory network (GRN) inference are often short due to the high costs of
sampling cell systems. Moreover, experimentalists lack a set of quantitative
guidelines that prescribe the minimal number of samples required to infer a
reliable GRN model. We study the temporal resolution of data vs quality of GRN
inference in order to ultimately overcome this deficit. The evolution of a
Markovian jump process model for the Ras/cAMP/PKA pathway of proteins and
metabolites in the G1 phase of the Saccharomyces cerevisiae cell cycle is
sampled at a number of different rates. For each time-series we infer a linear
regression model of the GRN using the LASSO method. The inferred network
topology is evaluated in terms of the area under the precision-recall curve
AUPR. By plotting the AUPR against the number of samples, we show that the
trade-off has a, roughly speaking, sigmoid shape. An optimal number of samples
corresponds to values on the ridge of the sigmoid
- …