Advanced modelling of genomic data in Inflammatory Bowel Disease

Abstract

Advances in next generation sequencing technologies allow the collection of enormous volumes of genomic data on large patient cohorts. Concurrently, machine learning algorithms are rapidly evolving and, together, these technologies represent the new frontier of research and clinical management on a path leading toward personalised medicine.The aims of this thesis are two. Firstly, to develop a mathematical framework for the analysis and integration of next generation sequencing data. Secondly, to model data from patients affected by inflammatory bowel disease (IBD), a common complex autoimmune condition with increasing incidence worldwide, by applying machine learning methodologies to clinical and transformed genomic data.The analyses presented in this thesis are largely based on a cohort of paediatric IBD patients for which clinical data, immunology and whole exome sequencing data were available.This research illustrates a supervised and unsupervised machine learning approach modelling histology and endoscopy data for assigning IBD patients with the correct CD/UC subtypes with superior accuracy.Stratification and classification of IBD patients can be improved by layering ge- nomic data on top of clinical evidence. This thesis also describes the development of GenePy, a mathematical model for transforming patients genomic data into a per-individual per-gene deleteriousness scoring system. GenePy is capable of modelling and implementing important biological information from whole exome sequencing data from patient DNA. GenePy eases the analysis and interpretation of genomic data on an individual basis and concomitantly allows the comparison of genetic profiles across patients. GenePy gene scores can be further combined according to molecular processes or pathways.This work describes eight novel immuno-genomic IBD sybtypes observed on a small cohort for which immune cytokine signalling and response cascades have been specifically profiled and GenePy scores obtained.In addition, the GenePy algorithm is applied using both supervised and unsuper- vised approaches to classify IBD subtypes and to explore alternative disease clas- sifications that discriminate molecular clinical subtypes that are clinically relevant for treatment and prognosis. This thesis reports the current highest performance in discriminating IBD subtypes using exome sequencing data and five novel ge- nomic patient strata defined by different mutational burden of adaptive immune system genes.This work demonstrates the power of integrating 21st century high throughput digital data in machine learning frameworks and the potential to obtain clinicallyrelevant strata for bench to bedside improvements in patient quality of life.<br/

    Similar works

    Full text

    thumbnail-image

    Available Versions