11 research outputs found

    Federated Learning for 5G Base Station Traffic Forecasting

    Full text link
    Mobile traffic prediction is of great importance on the path of enabling 5G mobile networks to perform smart and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations on different parties are in demand. Traditional approaches require collecting measurements from different base stations and sending them to a central entity, followed by performing machine learning operations using the received data. The dissemination of local observations raises privacy, confidentiality, and performance concerns, hindering the applicability of machine learning techniques. Various distributed learning methods have been proposed to address this issue, but their application to traffic prediction has yet to be explored. In this work, we study the effectiveness of federated learning applied to raw base station aggregated LTE data for time-series forecasting. We evaluate one-step predictions using 5 different neural network architectures trained with a federated setting on non-iid data. The presented algorithms have been submitted to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our results show that the learning architectures adapted to the federated setting achieve equivalent prediction error to the centralized setting, pre-processing techniques on base stations lead to higher forecasting accuracy, while state-of-the-art aggregators do not outperform simple approaches

    Federated Learning for Early Dropout Prediction on Healthy Ageing Applications

    Full text link
    The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models

    Intelligent Client Selection for Federated Learning using Cellular Automata

    Full text link
    Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly personalized results. However, the increased amount of sensors across diverse applications poses challenges in terms of communication and resource allocation, hindering the participation of all devices in the federated process and prompting the need for effective FL client selection. To address this issue, we propose Cellular Automaton-based Client Selection (CA-CS), a novel client selection algorithm, which leverages Cellular Automata (CA) as models to effectively capture spatio-temporal changes in a fast-evolving environment. CA-CS considers the computational resources and communication capacity of each participating client, while also accounting for inter-client interactions between neighbors during the client selection process, enabling intelligent client selection for online FL processes on data streams that closely resemble real-world scenarios. In this paper, we present a thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10 datasets, while making a direct comparison against a uniformly random client selection scheme. Our results demonstrate that CA-CS achieves comparable accuracy to the random selection approach, while effectively avoiding high-latency clients.Comment: 18th IEEE International Workshop on Cellular Nanoscale Networks and their Application

    Towards Energy-Aware Federated Traffic Prediction for Cellular Networks

    Full text link
    Cellular traffic prediction is a crucial activity for optimizing networks in fifth-generation (5G) networks and beyond, as accurate forecasting is essential for intelligent network design, resource allocation and anomaly mitigation. Although machine learning (ML) is a promising approach to effectively predict network traffic, the centralization of massive data in a single data center raises issues regarding confidentiality, privacy and data transfer demands. To address these challenges, federated learning (FL) emerges as an appealing ML training framework which offers high accurate predictions through parallel distributed computations. However, the environmental impact of these methods is often overlooked, which calls into question their sustainability. In this paper, we address the trade-off between accuracy and energy consumption in FL by proposing a novel sustainability indicator that allows assessing the feasibility of ML models. Then, we comprehensively evaluate state-of-the-art deep learning (DL) architectures in a federated scenario using real-world measurements from base station (BS) sites in the area of Barcelona, Spain. Our findings indicate that larger ML models achieve marginally improved performance but have a significant environmental impact in terms of carbon footprint, which make them impractical for real-world applications.Comment: International Symposium on Federated Learning Technologies and Applications (FLTA), 202

    FLAMENCO Learning Disabilities Dataset

    No full text
    <p><span>In the context of the FLAMENCO project, we have released a dataset designed for predicting potential deficiencies in children's communication skills, tailored for Federated Learning. This dataset specifically focuses on addressing two prevalent deficiencies in communication skill development in children: autism and intellectual disability. For each deficiency, two CSV files are provided—one for training machine learning models and another for testing them. Each entry in these CSV files includes the following details:</span></p> <p><span> </span></p> <p><span><span>-<span>        </span></span></span><span>case_id: An anonymized identifier used to distinguish cases.</span></p> <p><span><span>-<span>        </span></span></span><span>client_id: Identifies the client to which the case belongs, useful for dataset splitting in federated settings.</span></p> <p><span><span>-<span>        </span></span></span><span>A series of scores measuring specific communication skills:<span>  </span>These scores, such as Verbalization, Voicing, Syntax, etc., are derived from the child's performance in specialised gamified exercises and have been computed with the assistance of expert clinicians.</span></p> <ul> <li><span><span><span>        </span></span></span><span>target: Can be -1 (no clinician's diagnosis available for the case), 0 (no diagnosed deficiency in the case), 1 (indicates a positive diagnosis of communication deficiency by a clinician)</span></li> </ul> <p>TThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements No. 957406 (TERMINET).</p&gt

    Προστασία ιδιωτικότητας σε υπολογισμούς και μηχανική μάθηση σε αποκεντρωμένα δεδομένα

    No full text
    In the current era, Machine Learning (ML) applications are ubiquitous in our daily lives, triggering major milestones in our interaction with technology. Artificial Intelligence (AI) and ML have reshaped our approach to productivity by enhancing a wide range of activities and contributing to several societal domains. The rapid integration of these advancements into diverse sectors has activated unprecedented growth in research, exploring numerous facets of these technologies, from domain applicability to algorithms improving both model accuracy and efficiency. Similarly, commercial entities leverage these advancements to offer real-world applications, providing users with interactive platforms that benefit from these systems’ output. As the AI landscape evolves, issues surrounding ML privacy, security, scalability, collaboration, sustainability and barriers posed by regulations and laws such as the General Data Protection Regulation (GDPR) remain at the forefront regarding technical, ethical and regulatory concerns. The challenges associated with data acquisition and utilization represent a critical bottleneck for the large-scale implementation of ML technologies in multiple domains. Datasets serve as one of the most critical components in ML pipelines and are required for models to learn, adapt and perform. Nevertheless, the uneven attention and development across sectors, stemming mainly from data collection issues, highlights significant disparities in AI progress. While generative AI methods offer promising solutions to mitigate some of these challenges by synthesizing data, their primary focus is on image generation and Natural Language Processing (NLP) domains, underscoring a gap in addressing the needs of other areas. As such, the intersection of ML applicability with the multifaceted problems of data scarcity, privacy concerns and domain-specific needs presents a complex landscape for intelligent application development. In response to data scarcity and privacy concerns, distributed and edge AI strategies have been given much attention. Federated Learning (FL) emerges as a paradigm-shifting approach that facilitates collaborative ML model training across distributed participants, without sharing their data. FL directly enhances privacy and supports data minimization, a key requirement for regulations and laws compliance. While FL can significantly minimize domain-specific barriers by enabling access to advanced ML capabilities, issues like model convergence, handling statistically heterogeneous data as well as operational challenges regarding data security and ensuring user privacy should be addressed. As we push the boundaries of AI capabilities, it becomes imperative to integrate sustainability measures into the systems lifecycle, from training to deployment and operation. The energy consumption and carbon footprint associated with training complex AI models, especially those requiring extensive computational resources, pose significant environmental issues. In most cases, the focus on predictive accuracy over efficacy contradicts global efforts to reduce carbon emissions. FL offers a promising alternative to reduce computational and communication resource costs, promoting sustainable practices. Yet, a universal indicator for quantifying AI system sustainability, for models trained either using traditional ML pipelines or distributed approaches, remains absent. Even when systems are designed with a focus on privacy and sustainability, the post- deployment phase of a model’s lifecycle introduces additional complexities. Open problems like removing the influence of noisy and adversarial data or managing users’ requests for data deletion after model deployment pose significant challenges. To this end, Machine Unlearning emerged as a concept that directly aligns with privacy directives, facilitating the removal of specific data from a model’s acquired knowledge. Nevertheless, unlearning algorithms are still in their nascent stages, with additional research required to fully understand their implications, effectiveness and practical applications. The work presented in this thesis offers a holistic view of recent advancements in ML and AI applications, using both traditional and novel training paradigms to address a variety of real-world challenges. Our exploration begins with an approach to predictive modeling within the social care domain, followed by an exploration into generative AI on temporal graphs. The core of this thesis is dedicated to FL, which is examined from four perspectives: (1) privacy, (2) personalization, (3) collaboration and (4) sustainability. Each of these aspects is critical to the effective implementation of distributed learning, offering insights into the capabilities and limitations of FL applications. Finally, this work takes a turn in the ML landscape by addressing the post-learning removal of the influence of specific samples seen during training. The exploration in this thesis includes the application of methods in realistic use cases and on raw data, promoting generalized AI applications, privacy, personalized experience, operational synergies and sustainable practices. The first two studies concern traditional ML pipelines. First, in the social care domain, ML training with oversampling techniques improved predictive accuracy by 10% in a real-world imbalanced classification task. The system aims at alerting experts to intervene and improve the quality of life of elderly people. Second, a novel Deep Learning (DL) architecture based on an Encoder-Decoder structure, is applied for modeling transitions in temporal graphs. Besides synthetic temporal graphs, the architecture is applied to a realistic evolving graph collected from a social network regarding the connections among the members of the Greek parliament in a period spanning four years. In all cases, the DL model was able to capture transition dynamics, effectively leading to high generative capabilities. After generalized AI use cases, the thesis focuses on privacy-preserving FL in recommender systems. A privacy-preserving framework is introduced and the results confirm that FL effectively leads to comparable predictive accuracy to traditional settings, while the integrated privacy-enhancing mechanism incurs low computational overhead. In addition, a novel privacy-preserving information fusion among users, post-training federated recommender systems, is presented. The results confirm that integrating additional information, effectively enhances the predictive accuracy, while the privacy-preserving protocol incurs low computational costs with high privacy and security guarantees. After privacy-preserving FL, the thesis studies the collaborative and scalability aspects in a real-world use case, where network operators collaborate to predict future traffic demand and improve the network’s experience. The results show that FL improves scalability, incurs low communication costs and can lead to higher predictive accuracy than traditional settings, ultimately, leading to operational synergies. In response to the environmental aspects of AI pipelines, a sustainability indicator for universal ML methods is introduced. This indicator integrates accuracy, energy consumption and communication costs into a single metric, offering a novel tool for measuring and promoting environmentally friendly practices. The results show that complex models lead to higher predictive accuracy at the expense of energy consumption. Lastly, the thesis shifts its focus to the novel area of unlearning, presenting a new machine unlearning algorithm. The introduced algorithm is applied to three diverse datasets and the results confirm that it surpasses current approaches by effectively removing data influence from a model’s training while maintaining high predictive accuracy.Στη σύγχρονη εποχή, οι αναπτυσσόμενοι τομείς της μηχανικής μάθησης (Machine Learning) και της τεχνητής νοημοσύνης (Artificial Intelligence) έχουν επιφέρει ραγδαία τεχνολογική πρόοδο, αλλάζοντας τον τρόπο προσέγγισης και επίλυσης σύνθετων προβλημάτων. Οι ευρέως διαδεδομένες εφαρμογές μηχανικής μάθησης αναδεικνύουν τον μετασχηματισμό πληθώρας τομέων, από την παροχή υποστήριξης απλών διεργασιών έως και την ενίσχυση κρίσιμων υποδομών και συστημάτων δημόσιας υγείας. Η εξάπλωση των τεχνολογιών αυτών, δεν αποτελεί απλώς απόδειξη ανθρώπινης εφευρετικότητας, αλλά και αντανάκλαση της ολοένα και αυξανόμενης εξάρτησης της καθημερινής κοινωνίας από τη λήψη αποτελεσματικών αποφάσεων και την αυτοματοποίηση διαδικασιών. Οι εφαρμογές τεχνητής νοημοσύνης έχουν διαπεράσει σε διάφορες πτυχές της καθημερινής ζωής και της βιομηχανίας. Μεταξύ άλλων, περιλαμβάνονται συστήματα συστάσεων που προσαρμόζουν τις εμπειρίες των χρηστών σε διάφορα πεδία εφαρμογής (λιανικό εμπόριο, ειδησεογραφικό περιεχόμενο, ταινίες, μουσική, σημεία ενδιαφέροντος κ.α.), αλγόριθμοι πρόβλεψης λέξεων και κειμένου για τη βελτίωση της επικοινωνίας, ανάλυση πελατών για τη βελτιστοποίηση επιχειρηματικών στρατηγικών, αλγόριθμοι πρόβλεψης δικτυακής κίνησης για την βελτιστοποίηση δικτύων και αλγόριθμοι πρόβλεψης φυσικής και ψυχολογικής κατάστασης ατόμων για την έγκαιρη παρέμβαση από ειδικούς και διαχείριση της δημόσιας υγείας. Επιπρόσθετα, η ενσωμάτωση αυτών των τεχνολογιών σε έξυπνα σπίτια, πόλεις και εν γένει υποδομές τονίζει τη σημασία τους για την αστική ανάπτυξη και βιωσιμότητα. Τα προβλήματα που επιλύουν οι αλγόριθμοι μηχανικής μάθησης είναι ποικίλα και πολύπλευρα. Τα πιο δημοφιλή είδη προβλημάτων περιλαμβάνουν, την ταξινόμηση (classification), η οποία κατηγοριοποιεί τα δεδομένα σε προκαθορισμένες ετικέτες (labels), την συσταδοποίηση (clustering), η οποία εντοπίζει εγγενείς ομαδοποιήσεις, την παλινδρόμηση (regression), η οποία χρησιμοποιείται για την πρόβλεψη συνεχών τιμών και την πρόβλεψη χρονοσειρών (time-series forecasting) για την κατανόηση και πρόβλεψη διαχρονικών τάσεων. Τα τελευταία χρόνια παρατηρείται μεγάλη έξαρση ενδιαφέροντος στη δημιουργία και ανάπτυξη εφαρμογών σύνθεσης δεδομένων (generative Artificial Intelligence), με αλγορίθμους οι οποίοι υπόσχονται τον επαναπροσδιορισμό της καινοτομίας και δημιουργικότητας σε οποιοδήποτε πεδίο. Στο επίκεντρο αυτών των τεχνολογιών βρίσκεται η περίπλοκη διαδικασία ανάλυσης δεδομένων, τα οποία προέρχονται από απλούς αισθητήρες μέσω αλληλεπίδρασης με το φυσικό περιβάλλον έως και περιεχόμενο δημιουργημένο από χρήστες εφαρμογών. Οι πηγές δεδομένων τροφοδοτούν κεντρικά συστήματα συλλογής, όπου πραγματοποιείται προ επεξεργασία και χρήση των πληροφοριών για τη εκπαίδευση αλγορίθμων μηχανικής μάθησης και τη δημιουργία κατάλληλων και αποτελεσματικών εφαρμογών. Αν και τα σύνολα δεδομένων αποτελούν ένα από τα βασικότερα συστατικά της επιτυχίας των αλγορίθμων μηχανικής μάθησης, η συλλογή, ανάλυση και αποθήκευση των τεράστιων ποσοτήτων προσωπικών και ευαίσθητων δεδομένων εγείρει σημαντικά ζητήματα προστασίας της ιδιωτικότητας. Επιπλέον, καθώς οι ευφυείς εφαρμογές γίνονται όλο και πιο σύνθετες και απαιτούν όλο και περισσότερα δεδομένα, η ικανότητα αποτελεσματικής επεξεργασίας και ανάλυσης σε μεγάλη κλίμακα δεδομένων καθίσταται κρίσιμη πρόκληση. Για τους λόγους αυτούς, η Ομοσπονδιακή Μάθηση (Federated Learning) αναδείχθηκε ως μια πολλά υποσχόμενη λύση όσον αφορά την προστασία των δεδομένων, επιτρέποντας συνεργατικές και κλιμακούμενες λύσεις. Η έρευνα που έχει διεξαχθεί στα πλαίσια αυτής της διατριβής προσανατολίζεται στην διερεύνηση ρεαλιστικών εφαρμογών της μηχανικής μάθησης, με επίκεντρο την ποικιλομορφία των δεδομένων, την ανάδειξη της συνεργασίας για την δημιουργία αποτελεσματικών προβλέψεων, τη βιωσιμότητα των αλγορίθμων και την ενίσχυση της ασφάλειας των δεδομένων και των προτιμήσεων των χρηστών. Η ανάλυση συνίσταται από τα εξής χαρακτηριστικά: 1.Βασίζεται σε ρεαλιστικά δεδομένα, τα οποία μπορούν να χρησιμοποιηθούν για την επίλυση προβλημάτων του πραγματικού κόσμου και 2. Βασίζεται σε τεχνικές κατανεμημένων υπολογισμών μηχανικής μάθησης σε συνδυασμό με την ενσωμάτωση τεχνολογιών ενίσχυσης ιδιωτικότητας, τεχνικές οι οποίες ενισχύουν την επεκτασιμότητα και την ασφάλεια των δεδομένων.Συγκεκριμένα, εξετάζεται ένα ρεαλιστικό πρόβλημα ταξινόμησης και μια εφαρμογή σύνθεσης σε εξελισσόμενα στο χρόνο γραφήματα με τεχνικές μάθησης και παραδοσιακή ροή πληροφορίας. Επιπλέον, εφαρμόζονται κατανεμημένες τεχνικές, μέσω εκπαίδευσης αλγορίθμων με τη χρήση ομοσπονδιακής μάθησης, με ενσωματωμένες τεχνολογίες ενίσχυσης ιδιωτικότητας και εξατομίκευσης προβλέψεων σε συστήματα συστάσεων. Ακόμη, αναδεικνύεται η συνεργασία μεταξύ παρόχων υπηρεσιών στο σενάριο χρήσης πρόβλεψης δικτυακής κίνησης σταθμών βάσεων και επιχειρείται η δημιουργία ενός καθολικού δείκτη βιωσιμότητας για αλγορίθμους μηχανικής μάθησης. Τέλος, μελετάται ένα αναδυόμενο πεδίο, η μηχανική απομάθηση, η οποία έχει στόχο τη διαγραφή προηγούμενης γνώσης από εκπαιδευμένους αλγορίθμους μηχανικής μάθησης

    The “sweet” relations between diabetes and platelets

    No full text
    Atherosclerosis is the most important factor that leads to the high risk of atherothrombotic cases in patients with diabetes mellitus (DM). High morbidity and mortality in these patients are firstly caused by cardiovascular disease, mostly coronary artery disease (CAD) along with acute coronary syndrome (ACS) [1]

    Κακόβουλη μηχανική μάθηση: αξιολόγηση μοντέλων επίθεσης και μηχανισμών άμυνας

    No full text
    In recent years, there has been a sharp increase in the use of mobile platforms and particularly devices based on the Android operating system. This rapid use of mobile devices has fueled cybercriminals’ interest in developing and sharing malicious software. Machine learning algorithms can be used to detect malware with extremely high performance. However, many of these algorithms, and mainly neural network models, are vulnerable to changes in the input data, known as adversarial examples, capable of leading a model to produce misclassifications. This weakness is one of the major problems that the research community is called upon to solve. This thesis presents the evolution of malicious software for Android-based devices over time and refers to the extraction of an application’s features to detect malicious activity. In addition, ways of detecting malware through machine learning models are being developed, as well as ways in which an attacker can deceive these models. This work focuses on the experimental demonstration of the efficiency of machine learning models for malware detection and the weakness of these models against small changes in the input data. Finally, methods for defending models are being evaluated and special features of adversarial examples are being discussed.Τα τελευταία χρόνια παρατηρείται ραγδαία αύξηση στην χρήση κινητών πλατφορμών και ιδιαίτερα σε συσκευές που βασίζονται στο λογισμικό σύστημα Android. Η ραγδαία αυτή χρήση των κινητών συσκευών έχει κεντρίσει το ενδιαφέρον κυβερνοεγκληματιών για την ανάπτυξη και διαμοιρασμό κακόβουλου λογισμικού. Οι αλγόριθμοι μηχανικής μάθησης μπορούν να χρησιμοποιηθούν για τον εντοπισμό κακόβουλου λογισμικού, έχοντας εξαιρετικά υψηλές αποδόσεις. Ωστόσο, πολλοί από αυτούς τους αλγορίθμους και ειδικά τα μοντέλα νευρωνικών δικτύων είναι ευάλωτα σε αλλαγές στα δεδομένα εισόδου, γνωστά ως κακόβουλα παραδείγματα, ικανές να οδηγήσουν ένα μοντέλο στην παραγωγή εσφαλμένων ταξινομήσεων. Η αδυναμία αυτή αποτελεί ένα από τα σημαντικότερα προβλήματα που καλείται η ερευνητική κοινότητα να επιλύσει. Η παρούσα διπλωματική εργασία παρουσιάζει την εξέλιξη του κακόβουλου λογισμικού για κινητές συσκευές βασισμένες στο λογισμικό Android με το πέρασμα του χρόνου και γίνεται αναφορά στους τρόπους εξαγωγής χαρακτηριστικών των εφαρμογών με σκοπό την ανίχνευση κακόβουλης δραστηριότητας. Επιπλέον, αναπτύσσονται οι τρόποι ανίχνευσης κακόβουλου λογισμικού μέσω μοντέλων μηχανικής μάθησης, καθώς και οι τρόποι με τους οποίους ένας επιτιθέμενος μπορεί να εξαπατήσει τα μοντέλα αυτά. Η εργασία επικεντρώνεται στην πειραματική απόδειξη της ακρίβειας των μοντέλων μηχανικής μάθησης για τον εντοπισμό κακόβουλου λογισμικού και την αδυναμία των αλγορίθμων έναντι μικρών αλλαγών στα δεδομένα εισόδου. Τέλος, αξιολογούνται μέθοδοι προστασίας των μοντέλων, καθώς και συζητούνται ενδιαφέρουσες ιδιότητες των κακόβουλων παραδειγμάτων

    FedPOIRec: Privacy-preserving federated POI recommendation with social influence

    No full text
    <p>With the growing number of Location-Based Social Networks, privacy-preserving point-of-interest (POI) recommendation has become a critical challenge when helping users discover potentially interesting new places. Traditional systems take a centralized approach that requires the transmission and collection of private user data. In this work, we present FedPOIRec, a privacy-preserving federated learning approach enhanced with features from user social circles to generate top-N POI recommendations. First, the FedPOIRec framework is built on the principle that local data never leave the owner’s device, while a parameter server blindly aggregates the local updates. Second, the local recommender results are personalized by allowing users to exchange their learned parameters, enabling knowledge transfer among friends. To this end, we propose a privacy-preserving protocol for integrating the preferences of the user’s friends, after the federated computation, by exploiting the properties of the Cheon-Kim-Kim-Song (CKKS) fully homomorphic encryption scheme. To evaluate FedPOIRec, we apply our approach to five real-world datasets using two recommendation models. Extensive experiments demonstrate that FedPOIRec achieves comparable recommendation quality to centralized approaches, while the social integration protocol incurs low computation and communication overhead on the user device.</p&gt
    corecore