6 research outputs found

    A Generalized Look at Federated Learning: Survey and Perspectives

    Full text link
    Federated learning (FL) refers to a distributed machine learning framework involving learning from several decentralized edge clients without sharing local dataset. This distributed strategy prevents data leakage and enables on-device training as it updates the global model based on the local model updates. Despite offering several advantages, including data privacy and scalability, FL poses challenges such as statistical and system heterogeneity of data in federated networks, communication bottlenecks, privacy and security issues. This survey contains a systematic summarization of previous work, studies, and experiments on FL and presents a list of possibilities for FL across a range of applications and use cases. Other than that, various challenges of implementing FL and promising directions revolving around the corresponding challenges are provided.Comment: 9 pages, 2 figure

    Data Analytics for Uncovering Fraudulent Behaviour in Elite Sports

    Get PDF
    Sports officials around the world are facing societal challenges due to the unfair nature of fraudulent practices performed by unscrupulous athletes. Recently, sample swapping has been raised as a potential practice where some athletes exchange their doped sample with a clean one to evade a positive test. The current detection method for such cases includes laboratory testing like DNA analysis. However, these methods are costly and time-consuming, which goes beyond the budgetary limits of anti-doping organisations. Therefore, there is a need to explore alternative methods to improve decision-making. We presented a data analytical methodology that supports anti-doping decision-makers on the task of athlete disambiguation. Our proposed model helps identify the swapped sample, which outperforms the current state-of-the-art method and different baseline models. The evaluation on real-world sample swapping cases shows promising results that help advance the research on the application of data analytics in the context of anti-doping analysis

    An Introduction to Federated Learning and Its Analysis

    Full text link
    With the onset of the digital era, data privacy is one of the most predominant issues. Decentralized learning is becoming popular as the data can remain within local entities by maintaining privacy. Federated Learning is a decentralized machine learning approach, where multiple clients collaboratively learn a model, without sharing raw data. There are many practical challenges in solving Federated Learning, which include communication set up, data heterogeneity and computational capacity of clients. In this thesis, I explore recent methods of Federated Learning with various settings, such as data distributions and data variability, used in several applications. In addition, I, specifically, examine a design of systematic network topology in a federated framework with computational experiments

    Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

    Full text link
    Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the study history and terminology definition of this area. Then, we comprehensively review three basic lines of research: generalization, robustness, and fairness, by introducing their respective background concepts, task settings, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out several open issues in this field and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/WenkeHuang/MarsFL.Comment: 22 pages, 4 figure

    Heterogeneous Federated Learning: State-of-the-art and Research Challenges

    Full text link
    Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.Comment: 42 pages, 11 figures, and 4 table

    A New Generative Adversarial Network for Improving Classification Performance for Imbalanced Data

    Get PDF
    Data is a common issue in many industries, particularly in fields such as fraud detection and medical diagnosis. Imbalanced data refers to datasets where the distribution of classes is not equal, resulting in an over- representation of one class and an under-representation of another. This can lead to biassed and inaccurate machine learning models, as the algorithm may be inclined to favour the majority class and overlook important patterns in the minority class. Various sectors have utilised deep neural networks for data synthesis. However, according to research papers in these fields, balanced data outperforms imbalanced data when it comes to deep neural networks. Although deep generative approaches, such as Generative Adversarial Networks (GANs), are an efficient method of augmenting high-dimensional data, there is a lack of research on their effectiveness with credit card or breast cancer data and the current methods demonstrate limitations. Our research focuses on obtaining a great number of sets of data that are valid and resemble the minority class, in this case, fraudulent or malignant samples. Having more data like this can be used to train a binary classifier so it's effective against fraud or cancer diagnosis. To overcome challenges opposed to existing methods we have developed a novel GAN-based method called K-CGAN, which has been tested on credit card fraud and breast cancer data. K- CGAN is designed to generate synthetic data that resembles the minority class, effectively balancing the dataset and improving the performance of binary classifiers. Our research demonstrates the effectiveness of K-CGAN in handling complex data imbalance problems often encountered in practical applications. In addition, the experiments performed on different datasets indicate that K-CGAN can be used for various purposes. The application of machine learning algorithms in various industries has become increasingly popular in recent years. However, the quality and quantity of available data are crucial factors that directly impact the accuracy and reliability of these models. The scarcity and imbalance of datasets in certain domains pose challenges for researchers and practitioners, and the need for effective solutions is more pressing than ever. In this context, K- CGAN provides a promising approach to address data imbalance and improve the performance of machine learning models. Our results show that K-CGAN can be applied to different datasets with different characteristics, making it a valuable tool for data scientists and practitioners in various fields
    corecore