70,956 research outputs found
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Learning the optimal scale for GWAS through hierarchical SNP aggregation
Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal
genomic variants associated with rare human diseases. The classical statistical
approach for detecting these variants is based on univariate hypothesis
testing, with healthy individuals being tested against affected individuals at
each locus. Given that an individual's genotype is characterized by up to one
million SNPs, this approach lacks precision, since it may yield a large number
of false positives that can lead to erroneous conclusions about genetic
associations with the disease. One way to improve the detection of true genetic
associations is to reduce the number of hypotheses to be tested by grouping
SNPs. Results: We propose a dimension-reduction approach which can be applied
in the context of GWAS by making use of the haplotype structure of the human
genome. We compare our method with standard univariate and multivariate
approaches on both synthetic and real GWAS data, and we show that reducing the
dimension of the predictor matrix by aggregating SNPs gives a greater precision
in the detection of associations between the phenotype and genomic regions
Correlating neural and symbolic representations of language
Analysis methods which enable us to better understand the representations and
functioning of neural models of language are increasingly needed as deep
learning becomes the dominant approach in NLP. Here we present two methods
based on Representational Similarity Analysis (RSA) and Tree Kernels (TK) which
allow us to directly quantify how strongly the information encoded in neural
activation patterns corresponds to information represented by symbolic
structures such as syntax trees. We first validate our methods on the case of a
simple synthetic language for arithmetic expressions with clearly defined
syntax and semantics, and show that they exhibit the expected pattern of
results. We then apply our methods to correlate neural representations of
English sentences with their constituency parse trees.Comment: ACL 201
Deep Tree Transductions - A Short Survey
The paper surveys recent extensions of the Long-Short Term Memory networks to
handle tree structures from the perspective of learning non-trivial forms of
isomorph structured transductions. It provides a discussion of modern TreeLSTM
models, showing the effect of the bias induced by the direction of tree
processing. An empirical analysis is performed on real-world benchmarks,
highlighting how there is no single model adequate to effectively approach all
transduction problems.Comment: To appear in the Proceedings of the 2019 INNS Big Data and Deep
Learning (INNSBDDL 2019). arXiv admin note: text overlap with
arXiv:1809.0909
- …