1,434 research outputs found

    Personalising lung cancer screening with machine learning

    Get PDF
    Personalised screening is based on a straightforward concept: repeated risk assessment linked to tailored management. However, delivering such programmes at scale is complex. In this work, I aimed to contribute to two areas: the simplification of risk assessment to facilitate the implementation of personalised screening for lung cancer; and, the use of synthetic data to support privacy-preserving analytics in the absence of access to patient records. I first present parsimonious machine learning models for lung cancer screening, demonstrating an approach that couples the performance of model-based risk prediction with the simplicity of risk-factor-based criteria. I trained models to predict the five-year risk of developing or dying from lung cancer using UK Biobank and US National Lung Screening Trial participants before external validation amongst temporally and geographically distinct ever-smokers in the US Prostate, Lung, Colorectal and Ovarian Screening trial. I found that three predictors – age, smoking duration, and pack-years – within an ensemble machine learning framework achieved or exceeded parity in discrimination, calibration, and net benefit with comparators. Furthermore, I show that these models are more sensitive than risk-factor-based criteria, such as those currently recommended by the US Preventive Services Taskforce. For the implementation of more personalised healthcare, researchers and developers require ready access to high-quality datasets. As such data are sensitive, their use is subject to tight control, whilst the majority of data present in electronic records are not available for research use. Synthetic data are algorithmically generated but can maintain the statistical relationships present within an original dataset. In this work, I used explicitly privacy-preserving generators to create synthetic versions of the UK Biobank before we performed exploratory data analysis and prognostic model development. Comparing results when using the synthetic against the real datasets, we show the potential for synthetic data in facilitating prognostic modelling

    Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems

    Full text link
    This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.Comment: 12 pages, journal articl

    Introduction to Riemannian Geometry and Geometric Statistics: from basic theory to implementation with Geomstats

    Get PDF
    International audienceAs data is a predominant resource in applications, Riemannian geometry is a natural framework to model and unify complex nonlinear sources of data.However, the development of computational tools from the basic theory of Riemannian geometry is laborious.The work presented here forms one of the main contributions to the open-source project geomstats, that consists in a Python package providing efficient implementations of the concepts of Riemannian geometry and geometric statistics, both for mathematicians and for applied scientists for whom most of the difficulties are hidden under high-level functions. The goal of this monograph is two-fold. First, we aim at giving a self-contained exposition of the basic concepts of Riemannian geometry, providing illustrations and examples at each step and adopting a computational point of view. The second goal is to demonstrate how these concepts are implemented in Geomstats, explaining the choices that were made and the conventions chosen. The general concepts are exposed and specific examples are detailed along the text.The culmination of this implementation is to be able to perform statistics and machine learning on manifolds, with as few lines of codes as in the wide-spread machine learning tool scikit-learn. We exemplify this with an introduction to geometric statistics

    A Distributed Computation Model Based on Federated Learning Integrates Heterogeneous models and Consortium Blockchain for Solving Time-Varying Problems

    Full text link
    The recurrent neural network has been greatly developed for effectively solving time-varying problems corresponding to complex environments. However, limited by the way of centralized processing, the model performance is greatly affected by factors like the silos problems of the models and data in reality. Therefore, the emergence of distributed artificial intelligence such as federated learning (FL) makes it possible for the dynamic aggregation among models. However, the integration process of FL is still server-dependent, which may cause a great risk to the overall model. Also, it only allows collaboration between homogeneous models, and does not have a good solution for the interaction between heterogeneous models. Therefore, we propose a Distributed Computation Model (DCM) based on the consortium blockchain network to improve the credibility of the overall model and effective coordination among heterogeneous models. In addition, a Distributed Hierarchical Integration (DHI) algorithm is also designed for the global solution process. Within a group, permissioned nodes collect the local models' results from different permissionless nodes and then sends the aggregated results back to all the permissionless nodes to regularize the processing of the local models. After the iteration is completed, the secondary integration of the local results will be performed between permission nodes to obtain the global results. In the experiments, we verify the efficiency of DCM, where the results show that the proposed model outperforms many state-of-the-art models based on a federated learning framework

    Integrated widely tunable laser systems at 1300 and 1550 nm as swept sources for optical coherence tomography

    Get PDF

    Exponential integrators: tensor structured problems and applications

    Get PDF
    The solution of stiff systems of Ordinary Differential Equations (ODEs), that typically arise after spatial discretization of many important evolutionary Partial Differential Equations (PDEs), constitutes a topic of wide interest in numerical analysis. A prominent way to numerically integrate such systems involves using exponential integrators. In general, these kinds of schemes do not require the solution of (non)linear systems but rather the action of the matrix exponential and of some specific exponential-like functions (known in the literature as phi-functions). In this PhD thesis we aim at presenting efficient tensor-based tools to approximate such actions, both from a theoretical and from a practical point of view, when the problem has an underlying Kronecker sum structure. Moreover, we investigate the application of exponential integrators to compute numerical solutions of important equations in various fields, such as plasma physics, mean-field optimal control and computational chemistry. In any case, we provide several numerical examples and we perform extensive simulations, eventually exploiting modern hardware architectures such as multi-core Central Processing Units (CPUs) and Graphic Processing Units (GPUs). The results globally show the effectiveness and the superiority of the different approaches proposed

    Conditional Invertible Generative Models for Supervised Problems

    Get PDF
    Invertible neural networks (INNs), in the setting of normalizing flows, are a type of unconditional generative likelihood model. Despite various attractive properties compared to other common generative model types, they are rarely useful for supervised tasks or real applications due to their unguided outputs. In this work, we therefore present three new methods that extend the standard INN setting, falling under a broader category we term generative invertible models. These new methods allow leveraging the theoretical and practical benefits of INNs to solve supervised problems in new ways, including real-world applications from different branches of science. The key finding is that our approaches enhance many aspects of trustworthiness in comparison to conventional feed-forward networks, such as uncertainty estimation and quantification, explainability, and proper handling of outlier data
    • …
    corecore