117,229 research outputs found
Finding Statistically Significant Interactions between Continuous Features
The search for higher-order feature interactions that are statistically
significantly associated with a class variable is of high relevance in fields
such as Genetics or Healthcare, but the combinatorial explosion of the
candidate space makes this problem extremely challenging in terms of
computational efficiency and proper correction for multiple testing. While
recent progress has been made regarding this challenge for binary features, we
here present the first solution for continuous features. We propose an
algorithm which overcomes the combinatorial explosion of the search space of
higher-order interactions by deriving a lower bound on the p-value for each
interaction, which enables us to massively prune interactions that can never
reach significance and to thereby gain more statistical power. In our
experiments, our approach efficiently detects all significant interactions in a
variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International
Joint Conference on Artificial Intelligence (IJCAI 2019
Similarity-based virtual screening using 2D fingerprints
This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available
Multiple testing for SNP-SNP interactions
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real
dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'
- …