74,699 research outputs found
Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics
Survival analysis is a fundamental tool in medicine, modeling the time until
an event of interest occurs in a population. However, in real-world
applications, survival data are often incomplete, censored, distributed, and
confidential, especially in healthcare settings where privacy is critical. The
scarcity of data can severely limit the scalability of survival models to
distributed applications that rely on large data pools. Federated learning is a
promising technique that enables machine learning models to be trained on
multiple datasets without compromising user privacy, making it particularly
well-suited for addressing the challenges of survival data and large-scale
survival applications. Despite significant developments in federated learning
for classification and regression, many directions remain unexplored in the
context of survival analysis. In this work, we propose an extension of the
Federated Survival Forest algorithm, called FedSurF++. This federated ensemble
method constructs random survival forests in heterogeneous federations.
Specifically, we investigate several new tree sampling methods from client
forests and compare the results with state-of-the-art survival models based on
neural networks. The key advantage of FedSurF++ is its ability to achieve
comparable performance to existing methods while requiring only a single
communication round to complete. The extensive empirical investigation results
in a significant improvement from the algorithmic and privacy preservation
perspectives, making the original FedSurF algorithm more efficient, robust, and
private. We also present results on two real-world datasets demonstrating the
success of FedSurF++ in real-world healthcare studies. Our results underscore
the potential of FedSurF++ to improve the scalability and effectiveness of
survival analysis in distributed settings while preserving user privacy
Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge
We envision a mobile edge computing (MEC) framework for machine learning (ML)
technologies, which leverages distributed client data and computation resources
for training high-performance ML models while preserving client privacy. Toward
this future goal, this work aims to extend Federated Learning (FL), a
decentralized learning framework that enables privacy-preserving training of
models, to work with heterogeneous clients in a practical cellular network. The
FL protocol iteratively asks random clients to download a trainable model from
a server, update it with own data, and upload the updated model to the server,
while asking the server to aggregate multiple client updates to further improve
the model. While clients in this protocol are free from disclosing own private
data, the overall training process can become inefficient when some clients are
with limited computational resources (i.e. requiring longer update time) or
under poor wireless channel conditions (longer upload time). Our new FL
protocol, which we refer to as FedCS, mitigates this problem and performs FL
efficiently while actively managing clients based on their resource conditions.
Specifically, FedCS solves a client selection problem with resource
constraints, which allows the server to aggregate as many client updates as
possible and to accelerate performance improvement in ML models. We conducted
an experimental evaluation using publicly-available large-scale image datasets
to train deep neural networks on MEC environment simulations. The experimental
results show that FedCS is able to complete its training process in a
significantly shorter time compared to the original FL protocol
Harnessing spatial homogeneity of neuroimaging data: patch individual filter layers for CNNs
Neuroimaging data, e.g. obtained from magnetic resonance imaging (MRI), is
comparably homogeneous due to (1) the uniform structure of the brain and (2)
additional efforts to spatially normalize the data to a standard template using
linear and non-linear transformations. Convolutional neural networks (CNNs), in
contrast, have been specifically designed for highly heterogeneous data, such
as natural images, by sliding convolutional filters over different positions in
an image. Here, we suggest a new CNN architecture that combines the idea of
hierarchical abstraction in neural networks with a prior on the spatial
homogeneity of neuroimaging data: Whereas early layers are trained globally
using standard convolutional layers, we introduce for higher, more abstract
layers patch individual filters (PIF). By learning filters in individual image
regions (patches) without sharing weights, PIF layers can learn abstract
features faster and with fewer samples. We thoroughly evaluated PIF layers for
three different tasks and data sets, namely sex classification on UK Biobank
data, Alzheimer's disease detection on ADNI data and multiple sclerosis
detection on private hospital data. We demonstrate that CNNs using PIF layers
result in higher accuracies, especially in low sample size settings, and need
fewer training epochs for convergence. To the best of our knowledge, this is
the first study which introduces a prior on brain MRI for CNN learning
Adversarial Attack and Defense on Graph Data: A Survey
Deep neural networks (DNNs) have been widely applied to various applications
including image classification, text generation, audio recognition, and graph
data analysis. However, recent studies have shown that DNNs are vulnerable to
adversarial attacks. Though there are several works studying adversarial attack
and defense strategies on domains such as images and natural language
processing, it is still difficult to directly transfer the learned knowledge to
graph structure data due to its representation challenges. Given the importance
of graph analysis, an increasing number of works start to analyze the
robustness of machine learning models on graph data. Nevertheless, current
studies considering adversarial behaviors on graph data usually focus on
specific types of attacks with certain assumptions. In addition, each work
proposes its own mathematical formulation which makes the comparison among
different methods difficult. Therefore, in this paper, we aim to survey
existing adversarial learning strategies on graph data and first provide a
unified formulation for adversarial learning on graph data which covers most
adversarial learning studies on graph. Moreover, we also compare different
attacks and defenses on graph data and discuss their corresponding
contributions and limitations. In this work, we systemically organize the
considered works based on the features of each topic. This survey not only
serves as a reference for the research community, but also brings a clear image
researchers outside this research domain. Besides, we also create an online
resource and keep updating the relevant papers during the last two years. More
details of the comparisons of various studies based on this survey are
open-sourced at
https://github.com/YingtongDou/graph-adversarial-learning-literature.Comment: In submission to Journal. For more open-source and up-to-date
information, please check our Github repository:
https://github.com/YingtongDou/graph-adversarial-learning-literatur
Federated Neural Architecture Search
To preserve user privacy while enabling mobile intelligence, techniques have
been proposed to train deep neural networks on decentralized data. However,
training over decentralized data makes the design of neural architecture quite
difficult as it already was. Such difficulty is further amplified when
designing and deploying different neural architectures for heterogeneous mobile
platforms. In this work, we propose an automatic neural architecture search
into the decentralized training, as a new DNN training paradigm called
Federated Neural Architecture Search, namely federated NAS. To deal with the
primary challenge of limited on-client computational and communication
resources, we present FedNAS, a highly optimized framework for efficient
federated NAS. FedNAS fully exploits the key opportunity of insufficient model
candidate re-training during the architecture search process, and incorporates
three key optimizations: parallel candidates training on partial clients, early
dropping candidates with inferior performance, and dynamic round numbers.
Tested on large-scale datasets and typical CNN architectures, FedNAS achieves
comparable model accuracy as state-of-the-art NAS algorithm that trains models
with centralized data, and also reduces the client cost by up to two orders of
magnitude compared to a straightforward design of federated NAS
- …