Learning from the collective wisdom of crowds enhances the transparency of
scientific findings by incorporating diverse perspectives into the
decision-making process. Synthesizing such collective wisdom is related to the
statistical notion of fusion learning from multiple data sources or studies.
However, fusing inferences from diverse sources is challenging since
cross-source heterogeneity and potential data-sharing complicate statistical
inference. Moreover, studies may rely on disparate designs, employ widely
different modeling techniques for inferences, and prevailing data privacy norms
may forbid sharing even summary statistics across the studies for an overall
analysis. In this paper, we propose an Integrative Ranking and Thresholding
(IRT) framework for fusion learning in multiple testing. IRT operates under the
setting where from each study a triplet is available: the vector of binary
accept-reject decisions on the tested hypotheses, the study-specific False
Discovery Rate (FDR) level and the hypotheses tested by the study. Under this
setting, IRT constructs an aggregated, nonparametric, and discriminatory
measure of evidence against each null hypotheses, which facilitates ranking the
hypotheses in the order of their likelihood of being rejected. We show that IRT
guarantees an overall FDR control under arbitrary dependence between the
evidence measures as long as the studies control their respective FDR at the
desired levels. Furthermore, IRT synthesizes inferences from diverse studies
irrespective of the underlying multiple testing algorithms employed by them.
While the proofs of our theoretical statements are elementary, IRT is extremely
flexible, and a comprehensive numerical study demonstrates that it is a
powerful framework for pooling inferences.Comment: 29 pages and 10 figures. Under review at a journa