11 research outputs found
Gradient Assisted Learning
In distributed settings, collaborations between different entities, such as
financial institutions, medical centers, and retail markets, are crucial to
providing improved service and performance. However, the underlying entities
may have little interest in sharing their private data, proprietary models, and
objective functions. These privacy requirements have created new challenges for
collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new
method for various entities to assist each other in supervised learning tasks
without sharing data, models, and objective functions. In this framework, all
participants collaboratively optimize the aggregate of local loss functions,
and each participant autonomously builds its own model by iteratively fitting
the gradients of the objective function. Experimental studies demonstrate that
Gradient Assisted Learning can achieve performance close to centralized
learning when all data, models, and objective functions are fully disclosed
SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation
Nowadays, gathering high-quality training data from multiple data controllers
with privacy preservation is a key challenge to train high-quality machine
learning models. The potential solutions could dramatically break the barriers
among isolated data corpus, and consequently enlarge the range of data
available for processing. To this end, both academia researchers and industrial
vendors are recently strongly motivated to propose two main-stream folders of
solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated
Learning (FL for short). These two solutions have their advantages and
limitations when we evaluate them from privacy preservation, ways of
communication, communication overhead, format of data, the accuracy of trained
models, and application scenarios.
Motivated to demonstrate the research progress and discuss the insights on
the future directions, we thoroughly investigate these protocols and frameworks
of both MPL and FL. At first, we define the problem of training machine
learning models over multiple data sources with privacy-preserving (TMMPP for
short). Then, we compare the recent studies of TMMPP from the aspects of the
technical routes, parties supported, data partitioning, threat model, and
supported machine learning models, to show the advantages and limitations.
Next, we introduce the state-of-the-art platforms which support online training
over multiple data sources. Finally, we discuss the potential directions to
resolve the problem of TMMPP.Comment: 17 pages, 4 figure
Software Architecture Design for Federated Learning Systems
The advancements in deep learning and machine learning as the subdomain of AI have been demonstrated in multiple industries. However, the requirement for data by deep machine learning models has raised data privacy concerns. For instance, the EU's General Data Protection Regulation (GDPR) stipulates a range of data protection measures, causing data hungriness issues. Furthermore, trustworthy and responsible AI have emerged as hot topics recently thanks to the new ethical, legal, social, and technological challenges brought on by the technology. All of that led to the need for decentralised machine learning approaches.
Federated learning is an emerging privacy-preserving AI technique that trains models locally and formulates a global model without transferring local data externally. Being widely distributed with different components and stakeholders, federated learning requires software system design thinking and software engineering considerations. Nonetheless, the different software engineering challenges and the software architectural approaches of federated learning have not previously been conceptualised systematically in the software architecture literature. This thesis aims to address the software engineering research gap of federated learning systems and to provide system-level solutions to achieve trustworthy and responsible federated learning by design.
We first report the findings of a systematic literature review on federated learning from its software engineering perspective. Based on the study, the software architecture design concerns in building federated learning systems have been largely ignored. Thus, we present a collection of architectural patterns for the design challenges of federated learning systems and a set of decision models to assist software architects in pattern selection and perform architecture validations. The evaluation results show that the approaches are feasible and useful in serving as a guideline for federated learning software architecture design. We propose FLRA, a reference architecture for federated learning systems, and adopt the FLRA as the design basis to enhance trust for federated learning software architecture. Finally, we evaluated the designed federated learning architecture. The evaluation results show that the approach is feasible to enable accountability and improve fairness. Ultimately, the proposed system-level solution can achieve trustworthy and responsible federated learning