31 research outputs found

    Machine learning in Huntington’s disease:exploring the Enroll-HD dataset for prognosis and driving capability prediction

    Get PDF
    Background: In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of different diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirement for large datasets often prevents this approach. Huntington’s disease (HD) is a rare neurodegenerative disorder caused by a CAG repeat expansion in the coding region of the huntingtin gene. The world’s largest observational study for HD, Enroll-HD, describes over 21,000 participants. As such, Enroll-HD is amenable to ML methods. In this study, we pre-processed and imputed Enroll-HD with ML methods to maximise the inclusion of participants and variables. With this dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well-established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of ML methods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments. Results: Simple pre-processing imputed around 42% of missing values in Enroll-HD. Also, 167 variables were retained as a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula. The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehn formula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accurate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predicted with an accuracy of 85.2%. The resulting pre-processing workflow and code to train the ML models are available to be used for related HD predictions at: https://github.com/JasperO98/hdml/tree/main . Conclusions: Our pre-processing workflow made it possible to resolve the missing values and include most participants and variables in Enroll-HD. We show the added value of a ML approach, which improved AAO predictions and allowed for the development of an advisory model that can assist clinicians and participants in estimating future driving capability.</p

    High Speed Simulation Analytics

    Get PDF
    Simulation, especially Discrete-event simulation (DES) and Agent-based simulation (ABS), is widely used in industry to support decision making. It is used to create predictive models or Digital Twins of systems used to analyse what-if scenarios, perform sensitivity analytics on data and decisions and even to optimise the impact of decisions. Simulation-based Analytics, or just Simulation Analytics, therefore has a major role to play in Industry 4.0. However, a major issue in Simulation Analytics is speed. Extensive, continuous experimentation demanded by Industry 4.0 can take a significant time, especially if many replications are required. This is compounded by detailed models as these can take a long time to simulate. Distributed Simulation (DS) techniques use multiple computers to either speed up the simulation of a single model by splitting it across the computers and/or to speed up experimentation by running experiments across multiple computers in parallel. This chapter discusses how DS and Simulation Analytics, as well as concepts from contemporary e-Science, can be combined to contribute to the speed problem by creating a new approach called High Speed Simulation Analytics. We present a vision of High Speed Simulation Analytics to show how this might be integrated with the future of Industry 4.0

    Systems Biology in ELIXIR: modelling in the spotlight

    Get PDF
    In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR\u27s future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives

    The FAIR Guiding Principles for scientific data management and stewardship

    Get PDF
    There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community

    Toward interoperable bioscience data

    Get PDF
    © The Author(s), 2012. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Nature Genetics 44 (2012): 121-126, doi:10.1038/ng.1054.To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.The authors also acknowledge the following funding sources in particular: UK Biotechnology and Biological Sciences Research Council (BBSRC) BB/I000771/1 to S.-A.S. and A.T.; UK BBSRC BB/I025840/1 to S.-A.S.; UK BBSRC BB/I000917/1 to D.F.; EU CarcinoGENOMICS (PL037712) to J.K.; US National Institutes of Health (NIH) 1RC2CA148222-01 to W.H. and the HSCI; US MIRADA LTERS DEB-0717390 and Alfred P. Sloan Foundation (ICoMM) to L.A.-Z.; Swiss Federal Government through the Federal Office of Education and Science (FOES) to L.B. and I.X.; EU Innovative Medicines Initiative (IMI) Open PHACTS 115191 to C.T.E.; US Department of Energy (DOE) DE-AC02- 06CH11357 and Arthur P. Sloan Foundation (2011- 6-05) to J.G.; UK BBSRC SysMO-DB2 BB/I004637/1 and BBG0102181 to C.G.; UK BBSRC BB/I000933/1 to C.S. and J.L.G.; UK MRC UD99999906 to J.L.G.; US NIH R21 MH087336 (National Institute of Mental Health) and R00 GM079953 (National Institute of General Medical Science) to A.L.; NIH U54 HG006097 to J.C. and C.E.S.; Australian government through the National Collaborative Research Infrastructure Strategy (NCRIS); BIRN U24-RR025736 and BioScholar RO1-GM083871 to G.B. and the 2009 Super Science initiative to C.A.S

    Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

    Get PDF
    Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network

    Adjunctive rifampicin for Staphylococcus aureus bacteraemia (ARREST): a multicentre, randomised, double-blind, placebo-controlled trial.

    Get PDF
    BACKGROUND: Staphylococcus aureus bacteraemia is a common cause of severe community-acquired and hospital-acquired infection worldwide. We tested the hypothesis that adjunctive rifampicin would reduce bacteriologically confirmed treatment failure or disease recurrence, or death, by enhancing early S aureus killing, sterilising infected foci and blood faster, and reducing risks of dissemination and metastatic infection. METHODS: In this multicentre, randomised, double-blind, placebo-controlled trial, adults (≥18 years) with S aureus bacteraemia who had received ≤96 h of active antibiotic therapy were recruited from 29 UK hospitals. Patients were randomly assigned (1:1) via a computer-generated sequential randomisation list to receive 2 weeks of adjunctive rifampicin (600 mg or 900 mg per day according to weight, oral or intravenous) versus identical placebo, together with standard antibiotic therapy. Randomisation was stratified by centre. Patients, investigators, and those caring for the patients were masked to group allocation. The primary outcome was time to bacteriologically confirmed treatment failure or disease recurrence, or death (all-cause), from randomisation to 12 weeks, adjudicated by an independent review committee masked to the treatment. Analysis was intention to treat. This trial was registered, number ISRCTN37666216, and is closed to new participants. FINDINGS: Between Dec 10, 2012, and Oct 25, 2016, 758 eligible participants were randomly assigned: 370 to rifampicin and 388 to placebo. 485 (64%) participants had community-acquired S aureus infections, and 132 (17%) had nosocomial S aureus infections. 47 (6%) had meticillin-resistant infections. 301 (40%) participants had an initial deep infection focus. Standard antibiotics were given for 29 (IQR 18-45) days; 619 (82%) participants received flucloxacillin. By week 12, 62 (17%) of participants who received rifampicin versus 71 (18%) who received placebo experienced treatment failure or disease recurrence, or died (absolute risk difference -1·4%, 95% CI -7·0 to 4·3; hazard ratio 0·96, 0·68-1·35, p=0·81). From randomisation to 12 weeks, no evidence of differences in serious (p=0·17) or grade 3-4 (p=0·36) adverse events were observed; however, 63 (17%) participants in the rifampicin group versus 39 (10%) in the placebo group had antibiotic or trial drug-modifying adverse events (p=0·004), and 24 (6%) versus six (2%) had drug interactions (p=0·0005). INTERPRETATION: Adjunctive rifampicin provided no overall benefit over standard antibiotic therapy in adults with S aureus bacteraemia. FUNDING: UK National Institute for Health Research Health Technology Assessment
    corecore