67 research outputs found

    Integrative Analysis to Investigate Complex Interaction in Alzheimer’s Disease

    Get PDF
    Alzheimer’s disease (AD) is a neurodegenerative disorder featuring progressive cognitive and functional deficits. Pathologically, AD is characterized by tau and amyloid β protein deposition in the brain. As the sixth leading cause of death in the U.S., the disease course usually last from 7 to 10 years on average before the consequential death. In 2019 there are estimated 5.8 million Americans living with AD affecting 16 million family members. At certain stage of the disease course, patients with inability of maintaining their daily functioning highly depend on caregivers, primarily family caregivers, that incur estimated 18.4 billion unpaid hours of cares, which is equivalent to 232 billion dollars. These huge economic burdens and inevitable emotional distress on the family and the society would also increase as the number of AD affected population could triple by 2050. Altered cellular composition is associated with AD progression and decline in cognition, such as neuronal loss and astrocytosis, which is a key feature in neurodegeneration but has often been overlooked in transcriptome research. To explore the cellular composition changes in AD, I developed a deconvolution pipeline for bulk RNA-Seq to account for cell type specific effects in brain tissues. I found that neuronal and astrocyte relative proportions differ between healthy and diseased brains and also among AD cases that carry specific genetic risk variants. Brain carriers of pathogenic mutations in APP, PSEN1, or PSEN2 presented lower neuron and higher astrocyte relative proportions compared to sporadic AD. Similarly, the APOE ε4 allele also showed decreased neuronal and increased astrocyte relative proportions compared to AD non-carriers. In contrast, carriers of variants in TREM2 risk showed a lower degree of neuronal loss compared to matched AD cases in multiple independent studies. These findings suggest that genetic risk factors associated with AD etiology have a specific effect on the cellular composition of AD brains. The digital deconvolution approach provides an enhanced understanding of the fundamental molecular mechanisms underlying neurodegeneration, enabling the analysis of large bulk RNA-sequencing studies for cell composition. It also suggests that correcting for the cellular structure when performing transcriptomic analysis will lead to novel insights of AD. With deconvolution methods to delineate cell population changes in disease condition, it would help interpret transcriptomics results and reveal transcriptional changes in a cell type specific manner. One application demonstrated in this dissertation work is to use cell type proportion as quantitative trait to identify genetic factors associated with cellular composition changes. I performed cell type QTL analysis and identified a common pathway associated with neuronal protection underlying aging brains in the presence or absence of neurodegenerative disease symptoms. A protective variant of TMEM106B, which was previously identified with a protective effect in FTD, was identified to be associated with neuronal proportion in aging brains, suggesting a common pathway underlying neuronal protection and cognitive reservation in elderly. This extended analysis yield from deconvolution results demonstrated one promising direction of using deconvolution followed by cell type QTL analysis in identifying new genes or pathways underlying neurodegenerative or aging brains. To understand the complexity of the brain under disease condition, network analysis as a large-scale system-level approach provides unbiased and data-driven view to identify gene-gene interactions altered by disease status. Using network analysis, I replicated and reconfirmed the co-expression pattern between MS4A gene cluster and TREM2 in sporadic AD, from which further evidence was inferred from Bayesian network analysis to show that MS4A4A might be a potential regulator of TREM2 that is validated by in-vitro experiments. In Autosomal Dominant AD (ADAD) cohort, disrupted and acquired genes were identified from PSEN1 mutation carriers. Among these genes, previously identified AD risk genes and pathways were revealed along with novel findings. These results demonstrated the great potential of applying network approach in identifying disease associated genes and the interactions among them. To conclude the dissertation work from methodological, empirical, and theoretical levels, deconvolution pipeline for bulk RNA-Seq, cell type QTL analysis, and network analysis approaches were applied to understand transcriptome changes underlying disease etiology. From which previous AD related findings were replicated that validated the methods, and novel genes and pathways were identified as potential new therapeutic targets. Based on prior knowledge and empirical evidence observed from this dissertation work, a model is proposed to explain how genetic factors are assembled as a highly interconnected interactome network to affect proteinopathy observed in neurodegenerative disorders, that cause cellular composition changes in the brain, which ultimately leads to cognitive and functional deficits observed in AD patients

    Pathway and Network Approaches for Identification of Cancer Signature Markers from Omics Data

    Full text link
    The advancement of high throughput omic technologies during the past few years has made it possible to perform many complex assays in a much shorter time than the traditional approaches. The rapid accumulation and wide availability of omic data generated by these technologies offer great opportunities to unravel disease mechanisms, but also presents significant challenges to extract knowledge from such massive data and to evaluate the findings. To address these challenges, a number of pathway and network based approaches have been introduced. This review article evaluates these methods and discusses their application in cancer biomarker discovery using hepatocellular carcinoma (HCC) as an example

    Deep learning models for modeling cellular transcription systems

    Get PDF
    Cellular signal transduction system (CSTS) plays a fundamental role in maintaining homeostasis of a cell by detecting changes in its environment and orchestrates response. Perturbations of CSTS lead to diseases such as cancers. Almost all CSTSs are involved in regulating the expression of certain genes and leading to signature changes in gene expression. Therefore, the gene expression profile of a cell is the readout of the state of its CSTS and could be used to infer CSTS. However, a gene expression profile is a convoluted mixture of the responses to all active signaling pathways in cells. Therefore it is difficult to find the genes associated with an individual pathway. An efficient way of de-convoluting signals embedded in the gene expression profile is needed. At the beginning of the thesis, we applied Pearson correlation coefficient analysis to study cellular signals transduced from ceramide species (lipids) to genes. We found significant correlations between specific ceramide species or ceramide groups and gene expression. We showed that various dihydroceramide families regulated distinct subsets of target genes predicted to participate in distinct biologic processes. However, it’s well known that the signaling pathway structure is hierarchical. Useful information may not be fully detected if only linear models are used to study CSTS. More complex non-linear models are needed to represent the hierarchical structure of CSTS. This motivated us to investigate contemporary deep learning models (DLMs). Later, we applied various deep hierarchical models to learn a distributed representation of statistical structures embedded in transcriptomic data. The models learn and represent the hierarchical organization of transcriptomic machinery. Besides, they provide an abstract representation of the statistical structure of transcriptomic data with flexibility and different degrees of granularity. We showed that deep hierarchical models were capable of learning biologically sensible representations of the data (e.g., the hidden units in the first hidden layer could represent transcription factors) and revealing novel insights regarding the machinery regulating gene expression. We also showed that the model outperformed state-of-the-art methods such as Elastic-Net Linear Regression, Support Vector Machine and Non-Negative Matrix Factorization

    INTEGRATED GENOMIC MARKERS FOR CHEMOTHERAPEUTICS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Novel Method for Integrative Biological Studies

    Get PDF
    DNA microarray technology has been extensively utilized in the biomedical field, becoming a standard in identifying gene expression signatures for disease diagnosis/prognosis and pharmaceutical practices. Although cancer research has benefited from this technology, challenges such as large-scale data size, few replicates and complex heterogeneous data types remain; thus the biomarkers identified by various studies have a small proportion of overlap because of molecular heterogeneity. However, it is desirable in cancer research to consider robust and consistent biomarkers for drug development as well as diagnosis/prognosis. Although cancer is a highly heterogeneous disease, some mechanism common to developing cancers is believed to exist; integrating datasets from multiple experiments increases the accuracy of predictions because increasing the sample size improves and enhances biomarkers detection. Therefore, integrative study is required for compiling multiple cancer data sets when searching for the common mechanism leading to cancers. Some critical challenges of integration analysis remain despite many successful methods introduced. Few is able to work on data sets with different dimensionalities. More seriously, when the replicate number is small, most existing algorithms cannot deliver robust predictions through an integrative study. In fact, as modern high-throughput technology matures to provide increasingly precise data, and with well-designed experiments, variance across replicates is believed to be small for us to consider a mean pattern model. This model assumes that all the genes (or metabolites, proteins or DNA copies) are random samples of a hidden (mean pattern) model. The study implements this model using a hierarchical modelling structure. As the primary component of the system, a multi-scale Gaussian (MSG) model, designed to identify robust differentially-expressed genes to be integrated, was developed for predicting differentially expressed genes from microarray expression data of small replicate numbers. To assure the validity of the mean pattern hypothesis, a bimodality detection method that was a revision of the Bimodality index was proposed

    Identifying disease-associated genes based on artificial intelligence

    Get PDF
    Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations. Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods. The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations. The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes. In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patients’ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness

    Whole-transciptome analysis of [psi+] budding yeast via cDNA microarrays

    Get PDF
    Introduction: Prions of yeast present a novel analytical challenge in terms of both initial characterization and in vitro manipulation as models for human disease research. Presently, few robust analysis strategies have been successfully implemented which enable the efficient study of prion behavior in vivo. This study sought to evaluate the utilization of conventional dual-channel cDNA microarrays for the surveillance of transcriptomic regulation patterns by the [PSI+] yeast prion relative to an identical prion deficient yeast variant, [psi-]. Methods: A data analysis and normalization workflow strategy was developed and applied to cDNA array images, yielded quality-regulated expression ratios for a subset of genes exhibiting statistical congruence across multiple experimental repetitions and nested hybridization events. The significant gene list was analyzed using classical analytical approaches including several clustering-based methods and singular value decomposition. To add biological meaning to the differential expression data in hand, functional annotation using the Gene Ontology as well as several pathway-mapping approaches was conducted. Finally, the expression patterns observed were queried against all publicly curated microarray data performed using S. cerevisiae in order to discover similar expression behavior across a vast array of experimental conditions. Results: These data collectively implicate a low-level of overall genomic regulation as a result of the [PSI+] state, where the maximum statistically significant degree of differential expression was less than ±1 Log2(FC) in all cases. Notwithstanding, the [PSI+] differential expression was localized to several specific classes of structural elements and cellular functions, implying under homeostatic conditions significant up or down regulation is likely unnecessary but possible in those specific systems if environmental conditions warranted. As a result of these findings additional work pertaining to this system should include controlled insult to both yeast variants of differing environmental properties to promote a potential [PSI+] regulatory response coupled with co-surveillance of these conditions using transcriptomic and proteomic analysis methodologies

    DNA methylation as a biomarker for age-related cognitive impairment

    Get PDF
    PhD ThesisDue to the ageing population, the number of patients diagnosed with age-related diseases such as stroke and Parkinson’s disease are on the rise. In both post-stroke dementia (PSD) and mild cognitive impairment in Parkinson’s disease (PD-MCI), the mechanisms resulting in cognitive decline are unknown. This project aims to identify a biomarker which could predict those patients most at risk of developing cognitive decline, which would subsequently assist healthcare professionals in recommending early treatment and care. Epigenetics is an emerging field in which biomarkers have previously been useful in prognostication of cancers and prediction of cardiovascular disease. In this study, 30 patients from a PSD cohort (COGFAST) and 48 patients from a PD-MCI cohort (ICICLE) were analysed using the Illumina HumanMethylation450 BeadChip to identify differentially methylated positions which could predict patients who would later develop cognitive decline. Top hits were validated using Pyrosequencing to confirm DNA methylation differences in a replication cohort. Individual CpG sites within APOB and NGF were identified as potential blood-based biomarkers for PSD and one CpG site within CHCHD5 was highlighted as a potential blood-based biomarker for PD-MCI. In addition, methylation at one CpG site within NGF and a CpG site (cg18837178) within a non-coding RNA, were found to be associated with Braak staging (degree of brain pathology) using DNA from two brain regions. NGF deregulation has previously been associated with Alzheimer’s disease, and this finding indicates it may also have a role in the development of PSD. These novel findings represent the first steps towards the identification of blood-based biomarkers to assist with diagnosis of PSD and PD-MCI, but require further validation in a larger independent cohort. The differentially methylated genes identified may also give insight into some of the mechanisms involved in these complex diseases, potentially leading to the future development of targeted preventative treatments.Medical Research Council and Newcastle Universit
    corecore