10 research outputs found
A Greedy Homotopy Method for Regression with Nonconvex Constraints
Constrained least squares regression is an essential tool for
high-dimensional data analysis. Given a partition of input
variables, this paper considers a particular class of nonconvex constraint
functions that encourage the linear model to select a small number of variables
from a small number of groups in . Such constraints are relevant
in many practical applications, such as Genome-Wide Association Studies (GWAS).
Motivated by the efficiency of the Lasso homotopy method, we present RepLasso,
a greedy homotopy algorithm that tries to solve the induced sequence of
nonconvex problems by solving a sequence of suitably adapted convex surrogate
problems. We prove that in some situations RepLasso recovers the global minima
of the nonconvex problem. Moreover, even if it does not recover global minima,
we prove that in relevant cases it will still do no worse than the Lasso in
terms of support and signed support recovery, while in practice outperforming
it. We show empirically that the strategy can also be used to improve over
other Lasso-style algorithms. Finally, a GWAS of ankylosing spondylitis
highlights our method's practical utility
Neural Ranking Models with Weak Supervision
Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.Comment: In proceedings of The 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR2017
Identifying loci affecting trait variability and detecting interactions in genome-wide association studies
Identification of genetic variants with effects on trait variability can provide insights into the biological mechanisms that control variation and can identify potential interactions. We propose a two-degree-of-freedom test for jointly testing mean and variance effects to identify such variants. We implement the test in a linear mixed model, for which we provide an efficient algorithm and software. To focus on biologically interesting settings, we develop a test for dispersion effects, that is, variance effects not driven solely by mean effects when the trait distribution is non-normal. We apply our approach to body mass index in the subsample of the UK Biobank population with British ancestry (nâ~408,000) and show that our approach can increase the power to detect associated loci. We identify and replicate novel associations with significant variance effects that cannot be explained by the non-normality of body mass index, and we provide suggestive evidence for a connection between leptin levels and body mass index variability