The age of big data has fueled expectations for accelerating learning. The
availability of large data sets enables researchers to achieve more powerful
statistical analyses and enhances the reliability of conclusions, which can be
based on a broad collection of subjects. Often such data sets can be assembled
only with access to diverse sources; for example, medical research that
combines data from multiple centers in a federated analysis. However these
hopes must be balanced against data privacy concerns, which hinder sharing raw
data among centers. Consequently, federated analyses typically resort to
sharing data summaries from each center. The limitation to summaries carries
the risk that it will impair the efficiency of statistical analysis procedures.
In this work we take a close look at the effects of federated analysis on two
very basic problems, nonparametric comparison of two groups and quantile
estimation to describe the corresponding distributions. We also propose a
specific privacy-preserving data release policy for federated analysis with the
K-anonymity criterion, which has been adopted by the Medical Informatics
Platform of the European Human Brain Project. Our results show that, for our
tasks, there is only a modest loss of statistical efficiency