Search CORE

117,229 research outputs found

Finding Statistically Significant Interactions between Continuous Features

Author: Borgwardt Karsten
Sugiyama Mahito
Publication venue
Publication date: 10/05/2019
Field of study

The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019

arXiv.org e-Print Archive

Crossref

Similarity-based virtual screening using 2D fingerprints

Author: Bajorath
Belkin
Bender
Brown
Brown
Carhart
Charifson
Chen
Chen
Clark
Cramer
Cramer
Cruciani
Dixon
Downs
Everitt
Fligner
Flower
Ginn
Ginn
Godden
Godden
Gower
Hall
Harper
He
Hert
Hert
Hert
Hert
Holliday
Holliday
Hsu
Hubálek
Jenkins
Kearsley
Klein
Kubinyi
Lajiness
Leach
Makara
Martin
Matter
Nikolova
Patel
Peter Willett
Salim
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Stahura
Stahura
Walters
Wang
Warr
Whittle
Willett
Willett
Willett
Willett
Xue
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/12/2006
Field of study

This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

Crossref

White Rose Research Online

Multiple testing for SNP-SNP interactions

Author: Boulesteix Anne-Laure
Strobl Carolin
Wagenpfeil S.
Weidinger S.
Wichmann Heinz-Erich
Publication venue
Publication date: 01/01/2007
Field of study

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'

CiteSeerX

Crossref

Open Access LMU