218 research outputs found

    Federated Survival Analysis: Ensemble and Neural Methods for Distributed Time-to-Event Data

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Neural weighted A* : learning graph costs and heuristics with differentiable anytime A*

    Get PDF
    LAUREA MAGISTRALEUn recente filone di ricerca orientata al deep learning punta alla fusione di moduli neurali e moduli algoritmici all’interno della stessa architettura di apprendimento. I promotori di questo connubio promettono una migliore gestione di dati dalla complessità combinatoria da parte delle reti neurali, garantendo allo stesso tempo la convergenza dell’apprendimento con una minore quantità di dati. Tuttavia, questi algoritmi richiedono una derivata continua, ottenuta spesso tramite approssimazione del gradiente, al fine di essere compatibili con la procedura di apprendimento delle reti neurali, chiamata backpropagation. Identifichiamo con “apprendimento strutturato” l’insieme di tecniche che prevedono l’inclusione di algoritmi differenziabili, composti da operazioni discrete, all’interno di architetture neurali. Uno degli ambiti che beneficiano dell’apprendimento strutturato è la ricerca di percorsi minimi su grafi. Infatti, molti studi recenti puntano ad apprendere funzioni di costo su grafi attraverso il deep learning. L’apprendimento strutturato abilita la supervisione diretta su esempi di percorsi su immagini non etichettate, alleggerendo considerevolmente il costo della collezione di dati. Tuttavia, nessuno di questi studi punta ad apprendere anche funzioni euristiche. Pertanto, proponiamo Neural Weighted A*, un pianificatore di percorsi differenziabile capace di apprendere funzioni di costo e funzioni euristiche coerenti fra di loro. L’apprendimento avviene con supervisione end-to-end su coppie immagine-percorso, grazie a un solver differenziabile basato sull’algoritmo A*. Inoltre, Neural Weighted A* è la prima architettura che permette di controllare precisamente, attraverso un singolo parametro, il rapporto fra accuratezza ed efficienza della procedura di planning. In questo modo, l’utente può scegliere in qualsiasi momento se ottenere il percorso ottimo oppure una sua approssimazione, riducendo drasticamente il tempo di computazione. Il controllo sul bilancio fra accuratezza ed efficienza può avvenire anche a runtime e la soluzione, nel caso non corrisponda a quella ottima, risiede comunque, per costruzione, entro un margine proporzionale al parametro di controllo. Verifichiamo sperimentalmente la validità del nostro metodo contro diverse architetture dallo stato dell’arte, superandole sia in accuratezza che in efficienza. Gli esperimenti condotti vertono su due dataset composti da problemi di navigazione bidimensionale all'interno di mappe di videogiochi. Il primo dei due dataset è un benchmark comunemente utilizzato da architetture di planning data-driven, mentre il secondo è interamente nuovo. Entrambi i dataset sono disponibili pubblicamente.Recently, the trend of incorporating differentiable algorithms into deep learning architectures arose in machine learning research. The fusion of neural layers and algorithmic layers has been beneficial for handling combinatorial data, enabling the learning procedure to converge faster with fewer data samples. However, these algorithms comprise discrete operations and, therefore, require the definition of a smooth derivative in order to be compatible with backpropagation. By relying on automatic differentiation and gradient smoothing techniques, many differentiable algorithmic layers have been developed, leading to the birth of hybrid architectures, trainable end-to-end on combinatorial data. We refer to this research trend as structured learning. Among all the application domains that benefit from structured learning, we find shortest-path planning on graphs. Many studies aim at labeling graphs in the form of cost functions by supervised, end-to-end learning on shortest-path examples. The advantage of this approach is to enable planning on raw, unlabeled images without exploiting any domain knowledge whatsoever other than planning examples. However, none of these studies aims at learning heuristic functions too. Hence, we propose Neural Weighted A*, the first differentiable anytime planner able to learn cost functions and heuristic functions in a principled way. Learning occurs end-to-end on path examples thanks to a differentiable A* module included in the system. The second contribution of Neural Weighted A* is that our method offers the user the ability to smoothly control the balance between planning accuracy and efficiency using a single, real-valued parameter. In this way, the user can choose, even at runtime, to evaluate either the shortest path or a close approximation, drastically accelerating the planning procedure. In the latter case, since our method sets its roots into the Weighted A* algorithm, the solution suboptimality is constrained within a linear bound proportional to the tradeoff parameter. We experimentally show the validity of our claims by testing Neural Weighted A* against several baselines from the state of the art, outperforming all of them in planning accuracy and efficiency. The experiments are conducted on two datasets comprising planar navigation examples from videogame maps. The first dataset is an adaptation of a standard benchmark used for testing differentiable planning systems, while the second is novel. Both datasets are publicly available

    Federated Survival Forests

    Get PDF
    Survival analysis is a subfield of statistics concerned with modeling the occurrence time of a particular event of interest for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, real-world applications involve survival datasets that are distributed, incomplete, censored, and confidential. In this context, federated learning can tremendously improve the performance of survival analysis applications. Federated learning provides a set of privacy-preserving techniques to jointly train machine learning models on multiple datasets without compromising user privacy, leading to a better generalization performance. However, despite the widespread development of federated learning in recent AI research, few studies focus on federated survival analysis. In this work, we present a novel federated algorithm for survival analysis based on one of the most successful survival models, the random survival forest. We call the proposed method Federated Survival Forest (FedSurF). With a single communication round, FedSurF obtains a discriminative power comparable to deep-learning-based federated models trained over hundreds of federated iterations. Moreover, FedSurF retains all the advantages of random forests, namely low computational cost and natural handling of missing values and incomplete datasets. These advantages are especially desirable in real-world federated environments with multiple small datasets stored on devices with low computational capabilities. Numerical experiments compare FedSurF with state-of-the-art survival models in federated networks, showing how FedSurF outperforms deep-learning-based federated algorithms in realistic environments with non-identically distributed data

    Deep Survival Analysis for Healthcare: An Empirical Study on Post-Processing Techniques

    Get PDF
    Survival analysis is a crucial tool in healthcare, allowing us to understand and predict time-to-event occurrences using statistical and machine-learning techniques. As deep learning gains traction in this domain, a specific challenge emerges: neural network-based survival models often produce discrete-time outputs, with the number of discretization points being much fewer than the unique time points in the dataset, leading to potentially inaccurate survival functions. To this end, our study explores post-processing techniques for survival functions. Specifically, interpolation and smoothing can act as effective regularization, enhancing performance metrics integrated over time, such as the Integrated Brier Score and the Cumulative Area-Under-the-Curve. We employed various regularization techniques on diverse real-world healthcare datasets to validate this claim. Empirical results suggest a significant performance improvement when using these post-processing techniques, underscoring their potential as a robust enhancement for neural network-based survival models. These findings suggest that integrating the strengths of neural networks with the non-discrete nature of survival tasks can yield more accurate and reliable survival predictions in clinical scenarios

    Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics

    Full text link
    Survival analysis is a fundamental tool in medicine, modeling the time until an event of interest occurs in a population. However, in real-world applications, survival data are often incomplete, censored, distributed, and confidential, especially in healthcare settings where privacy is critical. The scarcity of data can severely limit the scalability of survival models to distributed applications that rely on large data pools. Federated learning is a promising technique that enables machine learning models to be trained on multiple datasets without compromising user privacy, making it particularly well-suited for addressing the challenges of survival data and large-scale survival applications. Despite significant developments in federated learning for classification and regression, many directions remain unexplored in the context of survival analysis. In this work, we propose an extension of the Federated Survival Forest algorithm, called FedSurF++. This federated ensemble method constructs random survival forests in heterogeneous federations. Specifically, we investigate several new tree sampling methods from client forests and compare the results with state-of-the-art survival models based on neural networks. The key advantage of FedSurF++ is its ability to achieve comparable performance to existing methods while requiring only a single communication round to complete. The extensive empirical investigation results in a significant improvement from the algorithmic and privacy preservation perspectives, making the original FedSurF algorithm more efficient, robust, and private. We also present results on two real-world datasets demonstrating the success of FedSurF++ in real-world healthcare studies. Our results underscore the potential of FedSurF++ to improve the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy

    Heterogeneous Datasets for Federated Survival Analysis Simulation

    Get PDF
    Heterogeneous Datasets for Federated Survival Analysis Simulation This repo contains three algorithms for constructing realistic federated datasets for survival analysis. Each algorithm starts from an existing non-federated dataset and assigns each sample to a specific client in the federation. The algorithms are: uniform_split: assigns each sample to a random client with uniform probability; quantity_skewed_split: assigns each sample to a random client according to the Dirichlet distribution [3, 4]; label_skewed_split: assigns each sample to a time bin, then assigns a set of samples from each bin to the clients according to the Dirichlet distribution [3, 4]. For more information, please take a look at our paper at https://arxiv.org/abs/2301.12166 [1]. Content federated_survival_datasets.zip: the content of the repository at https://github.com/archettialberto/federated_survival_datasets Heterogheneous_Datasets_for_Federated_Survival_Analysis_Simulation.pdf: the conference paper describing the work. Installation Federated Survival Datasets is built on top of numpy and scikit-learn. To install those libraries you can run pip install -r requirements.txt. To import survival datasets into your project, we strongly recommend SurvSet (https://github.com/ErikinBC/SurvSet) [2], a comprehensive collection of more than 70 survival datasets. Usage import numpy as np import pandas as pd from federated_survival_datasets import label_skewed_split # import a survival dataset and extract the input array X and the output array y df = pd.read_csv("metabric.csv") X = df[[f"x{i}" for i in range(9)]].to_numpy() y = np.array([(e, t) for e, t in zip(df["event"], df["time"])], dtype=[("event", bool), ("time", float)]) # run the splitting algorithm client_data = label_skewed_split(num_clients=8, X=X, y=y) # check the number of samples assigned to each client for i, (X_c, y_c) in enumerate(client_data): print(f"Client {i} - X: {X_c.shape}, y: {y_c.shape}") We provide an example notebook in the zipped folder to illustrate the proposed algorithms. It requires scikit-survival, seaborn, and pandas. References [1] Archetti, A., Lomurno, E., Lattari, F., Martin, A., & Matteucci, M. (2023). Heterogeneous Datasets for Federated Survival Analysis Simulation. arXiv preprint arXiv:2301.12166. [2] Drysdale, E. (2022). SurvSet: An open-source time-to-event dataset repository. arXiv preprint arXiv:2203.03094. [3] Hsu, T. M. H., Qi, H., & Brown, M. (2019). Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335. [4] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE

    Drug Inventory Control: Human Decisions Versus Deep Reinforcement Learning

    Get PDF
    We investigate whether and how deep reinforcement learning (DRL) can be exploited for managing inventory systems with a specific reference to perishable pharmaceutical products. A real-world case study is formulated as a Markov decision process, where states, actions, and rewards are defined. We then developed a DRL agent based on the Proximal Policy Optimization algorithm and compared its performance with a human decision-maker with several years of experience. Our findings reveal that the DRL agent outperforms the human policy by 11%, optimizing storage space and leading to growing profitability. Such incremental improvements can translate into substantial value for pharmaceutical companies operating in complex scenarios, and patients also stand to benefit. Finally, the study highlights the strategic advantage of integrating DRL into inventory management business operations, particularly for its ability to estimate uncertainty and manage corresponding supply chain risks

    The Bi-objective Long-haul Transportation Problem on a Road Network

    Full text link
    In this paper we study a long-haul truck scheduling problem where a path has to be determined for a vehicle traveling from a specified origin to a specified destination. We consider refueling decisions along the path, while accounting for heterogeneous fuel prices in a road network. Furthermore, the path has to comply with Hours of Service (HoS) regulations. Therefore, a path is defined by the actual road trajectory traveled by the vehicle, as well as the locations where the vehicle stops due to refueling, compliance with HoS regulations, or a combination of the two. This setting is cast in a bi-objective optimization problem, considering the minimization of fuel cost and the minimization of path duration. An algorithm is proposed to solve the problem on a road network. The algorithm builds a set of non-dominated paths with respect to the two objectives. Given the enormous theoretical size of the road network, the algorithm follows an interactive path construction mechanism. Specifically, the algorithm dynamically interacts with a geographic information system to identify the relevant potential paths and stop locations. Computational tests are made on real-sized instances where the distance covered ranges from 500 to 1500 km. The algorithm is compared with solutions obtained from a policy mimicking the current practice of a logistics company. The results show that the non-dominated solutions produced by the algorithm significantly dominate the ones generated by the current practice, in terms of fuel costs, while achieving similar path durations. The average number of non-dominated paths is 2.7, which allows decision makers to ultimately visually inspect the proposed alternatives

    SGDE: Secure Generative Data Exchange for Cross-Silo Federated Learning

    Get PDF
    Privacy regulation laws, such as GDPR, impose transparency and security as design pillars for data processing algorithms. In this context, federated learning is one of the most influential frameworks for privacy-preserving distributed machine learning, achieving astounding results in many natural language processing and computer vision tasks. Several federated learning frameworks employ differential privacy to prevent private data leakage to unauthorized parties and malicious attackers. Many studies, however, highlight the vulnerabilities of standard federated learning to poisoning and inference, thus raising concerns about potential risks for sensitive data. To address this issue, we present SGDE, a generative data exchange protocol that improves user security and machine learning performance in a cross-silo federation. The core of SGDE is to share data generators with strong differential privacy guarantees trained on private data instead of communicating explicit gradient information. These generators synthesize an arbitrarily large amount of data that retain the distinctive features of private samples but differ substantially. In this work, SGDE is tested in a cross-silo federated network on images and tabular datasets, exploiting beta-variational autoencoders as data generators. From the results, the inclusion of SGDE turns out to improve task accuracy and fairness, as well as resilience to the most influential attacks on federated learning
    corecore