Search CORE

4 research outputs found

The xyz algorithm for fast interaction search in high-dimensional data

Author: Meinshausen N
Shah RD
Thanei GA
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2018
Field of study

When performing regression on a data set with p variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least O(p2) if done naively. This cost can be prohibitive if p is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(pα) operations for 1 < α < 2 depending on their strength. The underlying idea is to transform interaction search into a closest pair problem which can be solved efficiently in subquadratic time. The algorithm is called xyz and is implemented in the language R. We demonstrate its efficiency for application to genome-wide association studies, where more than 1011 interactions can be screened in under 280 seconds with a single-core 1:2 GHz CPU.Isaac Newton Trust Early Career Support Schem

arXiv.org e-Print Archive

Repository for Publications and Research Data

Apollo (Cambridge)

Right singular vector projection graphs: fast high dimensional covariance matrix estimation under latent confounding

Author: Frot B
Meinshausen N
Shah RD
Thanei GA
Publication venue: Journal of the Royal Statistical Society. Series B: Statistical Methodology
Publication date: 01/04/2020
Field of study

In this work we consider the problem of estimating a high-dimensional

p \times p

covariance matrix

\Sigma

, given

n

observations of confounded data with covariance

\Sigma + \Gamma \Gamma^T

, where

\Gamma

is an unknown

p \times q

matrix of latent factor loadings. We propose a simple and scalable estimator based on the projection on to the right singular vectors of the observed data matrix, which we call RSVP. Our theoretical analysis of this method reveals that in contrast to PCA-based approaches, RSVP is able to cope well with settings where the smallest eigenvalue of

\Gamma^T \Gamma

is close to the largest eigenvalue of

\Sigma

, as well as settings where the eigenvalues of

\Gamma^T \Gamma

are diverging fast. It is also able to handle data that may have heavy tails and only requires that the data has an elliptical distribution. RSVP does not require knowledge or estimation of the number of latent factors

q

, but only recovers

\Sigma

up to an unknown positive scale factor. We argue this suffices in many applications, for example if an estimate of the correlation matrix is desired. We also show that by using subsampling, we can further improve the performance of the method. We demonstrate the favourable performance of RSVP through simulation experiments and an analysis of gene expression datasets collated by the GTEX consortium.Supported by an EPSRC First Grant and the Alan Turing Institute under the EPSRC grant EP/N510129/1

Repository for Publications and Research Data

Apollo (Cambridge)

Recommended from our members

The xyz algorithm for fast interaction search in high-dimensional data

Author: Meinshausen N
Shah Rajen
Thanei GA
Publication venue: Journal of Machine Learning Research
Publication date: 14/09/2018
Field of study

Apollo (Cambridge)

Protein phosphatase 2A (PP2A): a key phosphatase in the progression of chronic obstructive pulmonary disease (COPD) to lung cancer

Author: A Churg
A Collison
A Collison
A Kaur
A Khanna
A Khanna
A Mantovani
A Ruettger
A Soltani
A Spira
A Trockenbacher
AA Sablina
AL Durham
AM Collison
AM Houghton
AM Houghton
AM Smith
AM Wallace
B Kuznar-Kaminska
B Peng
B Stallberg
BK Velmurugan
C Gialeli
C Habrukowich
C Tiedje
CY Wang
D Kubota
DA Bergin
DM Carrick
DM Skillrud
E Barreiro
E Bousquet
E Bousquet
E Kawahara
E Wandzioch
E Yeh
EA Van Tubergen
ED Esplin
F Dubois
F Ogushi
G Caramori
G Caramori
G Houge
G Walter
GA Calin
H Azuma
H Azuma
H Liu
H Liu
H Takahashi
H Yao
HC Yu
HD Toop
HH Lee
HH Lee
HH Li
HJ Shen
HK Arnold
HK Arnold
I Cristobal
I Petrache
IM Adcock
J Brognard
J Didkowska
J Gong
J Guo
J Li
J Mazieres
J Mendelsohn
J Sangodkar
J Sangodkar
J Wang
JD Arroyo
JD O'Neil
JE Gadek
JK Burgess
JK Stoller
JL Dean
JP de Torres
JR Molina
K Dabbagh
K Deng
KF Chen
KS Thress
KV Rupanagudi
L Hatchwell
L Kramer
L Ma
L Sun
L Xu
L Xu
L Zhang
LB Lucas da Silva
LL Carr
LP Chen
M Fallahi
M Janghorban
M Janghorban
M Kong
M Tamaki
M Toda-Ishii
MA Ginos
MC Lavigne
MC Turner
MCS Wong
MH Hung
MK Jolly
MM Rahman
MM Rahman
MM Rahman
MO Kim
MR Junttila
NA Kalsheker
NG Hepper
NR Salinas
NS Gudmann
O David
O Kauko
O Ojo
P Geraghty
P Geraghty
P Kalev
P Prabhala
PM Nair
PN Black
Q Wang
QZ Dong
R Pippa
R Ruediger
R Ruediger
R Yabe
RE Russell
RF Foronjy
S Bozinovski
S Buddaseth
S Kobayashi
S Mochida
S Nath
S Reynhout
S Sakashita
S Sanduja
S Shen
S Thanei
SA Brooks
SA Quaderi
SA Saddoughi
SD Shapiro
SE Brennan
SG Carlson
SL Clement
SR Soofiyani
SS Kadkol
SS Sohal
SS Sohal
SS Wang
T Brabletz
T Ito
T Nakajima
T Renda
TA Santa-Coloma
TT Chao
TT Chao
TW Fitzgerald
V Galbiati
V Janssens
W Chen
W Sents
W Yan
WJ Cao
X Sun
X Zhou
XK Zhao
XT Zheng
Y Khew-Goodall
Y Matsuoka
Y Takiguchi
Y Xing
Y Zhao
YH Gao
Z Liu
Z Liu
Z Navratilova
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref