36 research outputs found

    Machine learning in Huntington’s disease:exploring the Enroll-HD dataset for prognosis and driving capability prediction

    Get PDF
    Background: In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of different diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirement for large datasets often prevents this approach. Huntington’s disease (HD) is a rare neurodegenerative disorder caused by a CAG repeat expansion in the coding region of the huntingtin gene. The world’s largest observational study for HD, Enroll-HD, describes over 21,000 participants. As such, Enroll-HD is amenable to ML methods. In this study, we pre-processed and imputed Enroll-HD with ML methods to maximise the inclusion of participants and variables. With this dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well-established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of ML methods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments. Results: Simple pre-processing imputed around 42% of missing values in Enroll-HD. Also, 167 variables were retained as a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula. The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehn formula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accurate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predicted with an accuracy of 85.2%. The resulting pre-processing workflow and code to train the ML models are available to be used for related HD predictions at: https://github.com/JasperO98/hdml/tree/main . Conclusions: Our pre-processing workflow made it possible to resolve the missing values and include most participants and variables in Enroll-HD. We show the added value of a ML approach, which improved AAO predictions and allowed for the development of an advisory model that can assist clinicians and participants in estimating future driving capability.</p

    High Speed Simulation Analytics

    Get PDF
    Simulation, especially Discrete-event simulation (DES) and Agent-based simulation (ABS), is widely used in industry to support decision making. It is used to create predictive models or Digital Twins of systems used to analyse what-if scenarios, perform sensitivity analytics on data and decisions and even to optimise the impact of decisions. Simulation-based Analytics, or just Simulation Analytics, therefore has a major role to play in Industry 4.0. However, a major issue in Simulation Analytics is speed. Extensive, continuous experimentation demanded by Industry 4.0 can take a significant time, especially if many replications are required. This is compounded by detailed models as these can take a long time to simulate. Distributed Simulation (DS) techniques use multiple computers to either speed up the simulation of a single model by splitting it across the computers and/or to speed up experimentation by running experiments across multiple computers in parallel. This chapter discusses how DS and Simulation Analytics, as well as concepts from contemporary e-Science, can be combined to contribute to the speed problem by creating a new approach called High Speed Simulation Analytics. We present a vision of High Speed Simulation Analytics to show how this might be integrated with the future of Industry 4.0

    Systems Biology in ELIXIR: modelling in the spotlight

    Get PDF
    In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR\u27s future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives
    corecore