2 research outputs found
Recommended from our members
Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer's Disease.
The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, subjects with mild cognitive impairment, and patients with Alzheimer's Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations. Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.Includes EPSRC
Computational Methods to Analyze Next-generation Sequencing Data in Genomics and Metagenomics
This thesis focuses on two important computational problems in genomics and metagenomics with the public available next-generation sequencing data. One is about gene regulation, for which we explore how distal regulatory elements may interact with the proximal regulatory elements. The other is about metagenomics, in which we study how to reconstruct bacterial strain genomes from shotgun reads. Studying gene regulation, especially distal gene regulation, is important because regulatory elements, including those in distal regulatory regions, orchestrate when, where and how much a gene is activated under every experimental condition. Their dysfunction results in various types of diseases. Moreover, the current study on distal gene regulation is still under development. The study of bacterial strains is also vital, as the bacterial strains are the main source of drug resistance, mixed infection, reinfection, etc. The study of novel bacterial strains is still in its infancy, with only one tool that can work with multiple metagenomic samples while has suboptimal performance. We identified hundreds of pairs of regulatory elements that are biologically sound and are likely to contribute to the interaction of distal and proximal regulatory regions. We demonstrated for the first time that ribosomal protein genes share common distal regulatory regions under the same experimental conditions and might be differentially regulated across different experimental conditions. In addition, we developed a novel approach called SMS to reconstruct novel bacterial strains from multiple shotgun metagenomic samples. Tested on 702 simulated and 195 experimental datasets, we showed that SMS has high accuracy in inferring the present strains, including the strain number, strain abundance, strain variations, etc. Compared with the two existing approaches, SMS shows much better performance. Our studies shed new light on genomics and generated novel tools in metagenomics