696 research outputs found

    Multi-class gene expression biomarker panel identification for the diagnosis of paediatric febrile illness

    Get PDF
    Febrile illness in children can result from infections by diverse viral or bacterial pathogens as well as inflammatory conditions or cancer. The limitations of the existing diagnostic pipeline, which relies on clinical symptoms and signs, pathogen detection, empirical treatment and diagnoses of exclusion, contribute to missed or de- layed diagnosis and unnecessary antibiotic use. The potential of host gene expression biomarkers measured in blood has been demonstrated for simplified binary diagnostic questions however, the clinical reality is that multiple potential aetiologies must be considered and prioritised on the basis of likelihood and risks of severe disease. In order to identify a biomarker panel which better reflects this clinical reality, we applied a multi-class supervised learning approach to whole blood transcriptomic datasets from children with infectious and inflammatory disease. Three datasets were used for the analyses presented here, a single microarray dataset, a meta-analysis of 12 publicly available microarray datasets and a newly generated RNA-sequencing dataset. These were used for preliminary investigations of the approach, discovery of a multi-class biomarker panel of febrile illness and valida- tion of the biomarker panel respectively. In the merged microarray discovery dataset a two-stage approach to feature selection and classification, based on LASSO and Ridge penalised regression was applied to distinguish 18 disease classes. Cost-sensitivity was incorporated in the approach as aetiologies of febrile illness vary considerably in the risk of severe disease. The resulting 161 transcript biomarker panel could reliably distinguish bacterial, viral, inflammatory, tuberculosis and malarial disease as well as pathogen specific aetiologies. The panel was then validated in a newly generated RNA-Seq dataset and compared to previously published binary biomarker panels. The analyses presented here demonstrate that a single test for the diagnosis of acute febrile illness in children is possible using host RNA biomarkers. A test which could distinguish multiple aetiologies soon after presentation could be used to reduce unnecessary antibiotic use, improve targetting of antibiotics to bacterial species and reduce delays in the diagnosis of inflammatory diseases.Open Acces

    Computational approaches to find transcriptomic and epigenomic signatures of latent TB in HIV patients

    Get PDF
    Abstract: HIV infection promotes the progression of latent infection of Mtb to the active disease with the primary challenge of diagnosis being the development of efficient and sensitive methods to detect latent TB in HIV infected individuals. Previous studies have identified transcriptional signatures for active TB along with signatures predicting the risk of active TB disease in latent TB infected individuals or those with other diseases. Existing studies have also identified characteristic genes for active TB in HIV infected patients. However, no studies have identified predictive transcriptional signatures that discriminate latent TB from active TB disease in HIV positive persons as well epigenetic mechanisms associated with latent TB/HIV coinfection. The aim of this study was to develop a computational pipeline using statistical modelling and machine learning (ML) methods to identify a transcriptomic signature associated with latent TB in HIV positive patients and to identify candidate epigenetic modifications for future studies. A novel pipeline, that leverages statistical differential expression analyses (OPLS-DA) and supervised ML and feature selection methods, was applied to an existing transcriptomic dataset (NCBI GEO repository accession number GSE37250) and the outcome of the two methodologies were integrated to define a gene signature characterising the progression of latent to active TB in HIV infected patients. Enrichment analysis was performed on the transcriptomic panel of genes to predict candidate epigenetic marks in latent TB/HIV coinfection. An 11-gene minimal signature was identified of which the expression levels discriminate between latent TB and active TB in HIV positive patients. A broader analysis of DEGs identified by the ML and OPLS-DA revealed enrichment of pathways related to T- and B-cell receptor signalling, metabolic processes, insulin signalling, endocrine resistance and ATP-binding. Candidate epigenetic alterations associated with latent TB in the HIV positive cohort were identified using transcription factor (TF), histone modification (HM) and miRNA enrichment analyses. This novel integrative approach to identify a discriminative latent TB gene signature provided new insights into the response mechanism of HIV co-infection with Mtb, and pathways that merit further investigation was identified. The genes of interest identified may provide novel diagnostic and therapeutic targets for latent TB in patients who are HIV positive.M.Sc. (Biochemistry

    Identification of host gene expression biomarkers for tuberculosis

    Get PDF
    The presence of disease, including infectious disease, has been observed to give rise to specific patterns of gene expression in peripheral whole blood, regardless of disease site. These gene expression signatures allow for distinction between diseases and have the potential to reform diagnostics, particularly in diseases and patient groups for whom current diagnostics are unreliable, like Tuberculosis (TB). Although TB is a treatable infectious disease, it has high morbidity and mortality, especially in low resource countries and HIV infected patients. In this thesis, I propose a bioinformatics toolbox that derives minimal transcriptomic signatures from microarray datasets acquired from heterogeneous groups regardless of underlying co-infections and geographic locations. The transcripts’ expression values are then aggregated into a single value disease risk score (DRS) for every patient, that allows for classification between the disease groups in a binary manner. The toolbox was employed to analyse an adult and a paediatric TB transcriptomic study, comprising HIV infected and uninfected patients from sub-Saharan Africa. In the adult study, the DRS based on a 27-transcript signature distinguished culture confirmed TB from latent TB infection (LTBI), while 44 transcripts distinguished TB from other diseases phenotypically similar to TB (OD), with high sensitivity and specificity. Out-of-sample validation was performed using a publicly available dataset. In the paediatric study, a 51-transcript signature distinguished TB from OD and a 42-transcript signature from LTBI. The signatures were validated out-of-sample using an independent cohort and benchmarked against culture-negative TB patients and Xpert® MTB/RIF, currently used for detection of M. tuberculosis. This thesis provides proof of principle that minimal host blood transcriptional signatures are able to distinguish TB from LTBI and OD regardless of HIV infection. The subsequent transformation of the signatures into a score for every patient may facilitate disease categorisation and potentially development of diagnostic tools.Open Acces

    Data mining of host transcriptome and microbiome in pulmonary disease

    Full text link
    Pulmonary disease is one of the most common and serious medical conditions in the world, and the correct diagnosis and prediction of incipient pulmonary diseases such as tuberculosis (TB) and lung cancer can greatly decrease the number of pulmonary disease-related deaths. In this thesis, I studied the transcriptome and microbiome difference between pulmonary disease patients and healthy controls, developed and applied several pipelines incorporating bioinformatics methods, statistics and machine learning models to identify patterns in human transcriptome as well as microbiome data for pulmonary disease prediction. On the host transcriptome side, I first evaluated the performance of existing TB disease and TB progression biomarkers, created a bulk RNA-seq gene-expression based biomarker selection pipeline, and then identified a 29-gene signature that can correctly predict TB progression as far as 6 years before the TB diagnosis. On microbiome side, I developed Animalcules, an R package for microbiome data analysis such as diversity comparison and differential abundance analysis, which supports both user graphical interface and command-line functions. I then applied Animalcules for two microbiome case studies: identifying the TB and Asthma related microbes. After working on host transcriptome and microbiome separately, I then discussed the computational framework for identifying host-microbe interactions, and its significant potential for studying pulmonary disease pathogenesis, diagnosis and treatment

    Establishment of predictive blood-based signatures in medical large scale genomic data sets : Development of novel diagnostic tests

    Get PDF
    Increasing data has led to tremendous success in discovering molecular biomarkers based on high throughput data. However, the translation of these so-called genomic signatures into clinical practice has been limited. The complexity and volume of genomic profiling requires heightened attention to robust design, methodological details, and avoidance of bias. During this thesis, novel strategies aimed at closing the gap from initially promising pilot studies to the clinical application of novel biomarkers are evaluated. First, a conventional process for genomic biomarker development comprising feature selection, algorithm and parameter optimization, and performance assessment was established. Using this approach, a RNA-stabilized whole blood diagnostic classifier for non-small cell lung cancer was built in a training set that can be used as a biomarker to discriminate between patients and control samples. Subsequently, this optimized classifier was successfully applied to two independent and blinded validation sets. Extensive permutation analysis using random feature lists supports the specificity of the established transcriptional classifier. Next, it was demonstrated that a combined approach of clinical trial simulation and adaptive learning strategies can be used to speed up biomarker development. As a model, genome-wide expression data derived from over 4,700 individuals in 37 studies addressing four clinical endpoints were used to assess over 1,800,000 classifiers. In addition to current approaches determining optimal classifiers within a defined study setting, randomized clinical trial simulation unequivocally uncovered the overall variance in the prediction performance of potential disease classifiers to predict the outcome of a large biomarker validation study from a pilot trial. Furthermore, most informative features were identified by feature ranking according to an individual classification performance score. Applying an adaptive learning strategy based on data extrapolation led to a datadriven prediction of the study size required for larger validation studies based on small pilot trials and an estimate of the expected statistical performance during validation. With these significant improvements, exceedingly robust and clinically applicable gene signatures for the diagnosis and detection of acute myeloid leukemia, active tuberculosis, HIV infection, and non-small cell lung cancer are established which could demonstrate disease-related enrichment of the obtained signatures and phenotype-related feature ranking. In further research, platform requirements for blood-based biomarker development were exemplarily examined for micro RNA expression profiling. The performance as well as the technical sample handling to provide reliable strategies for platform implementation in clinical applications were investigated. Overall, all introduced methods improve and accelerate the development of biomarker signatures for molecular diagnostics and can easily be extended to other high throughput data and other disease settings

    Clinical Utility of microRNAs in Exhaled Breath Condensate as Biomarkers for Lung Cancer.

    Get PDF
    This study represents a novel proof of concept of the clinical utility of miRNAs from exhaled breath condensate (EBC) as biomarkers of lung cancer (LC). Genome-wide miRNA profiling and machine learning analysis were performed on EBC from 21 healthy volunteers and 21 LC patients. The levels of 12 miRNAs were significantly altered in EBC from LC patients where a specific signature of miR-4507, miR-6777-5p and miR-451a distinguished these patients with high accuracy. Besides, a distinctive miRNA profile between LC adenocarcinoma and squamous cell carcinoma was observed, where a combined panel of miR-4529-3p, miR-8075 and miR-7704 enabling discrimination between them. EBC levels of miR-6777-5p, 6780a-5p and miR-877-5p predicted clinical outcome at 500 days. Two additional miRNA signatures were also associated with other clinical features such as stage and invasion status. Dysregulated EBC miRNAs showed potential target genes related to LC pathogenesis, including CDKN2B, PTEN, TP53, BCL2, KRAS and EGFR. We conclude that EBC miRNAs might allow the identification, stratification and monitorization of LC, which could lead to the development of precision medicine in this and other respiratory diseases

    Developing statistical and bioinformatic analysis of genomic data from tumours

    Get PDF
    Previous prognostic signatures for melanoma based on tumour transcriptomic data were developed predominantly on cohorts of AJCC (American Joint Committee on Cancer) stages III and IV melanoma. Since 92% of melanoma patients are diagnosed at AJCC stages I and II, there is an urgent need for better prognostic biomarkers to allow patient stratification for receiving early adjuvant therapies. This study uses genome-wide tumour gene expression levels and clinico-histopathological characteristics of patients from the Leeds Melanoma Cohort (LMC). Several unsupervised and supervised classification approaches were applied to the transcriptomic data, to identify biological classes of melanoma, and to develop prognostic classification models respectively. Unsupervised clustering identified six biologically distinct primary melanoma classes (LMC classes). Unlike previous molecular classes of melanoma, the LMC classes were prognostic in both the whole LMC dataset and in stage I tumours. The prognostic value of the LMC classes was replicated in an independent dataset, but insufficient data were available to replicate in an AJCC stage I subset. Supervised classification using the Random Forest (RF) approach provided improved performances when adjustments were made to deal with class imbalance, while this did not improve performance of the Support Vector Machine (SVM). However, RF and SVM had similar results overall, with RF only marginally better. Combining clinical and transcriptomic information in the RF further improved the performance of the prediction model in comparison to using clinical information alone. Finally, the agnostically derived LMC classes and the supervised RF model showed convergence in their association with outcome in some groups of patients, but not in others. In conclusion, this study reports six molecular classes of primary melanoma with prognostic value in stage I disease and overall, and a prognostic classification model that predicts outcome in primary melanoma
    corecore