8 research outputs found

    Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

    Get PDF
    Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments

    Identification of gene regulation modules that act in the interaction between cork development and environmental variables

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, 2023, Universidade de Lisboa, Faculdade de CiênciasCork oak (Quercus suber) is a mediterranean tree that excels in phellem (cork) production, a valuable raw product with multiple industrial applications. However, increased frequency and severity of drought events, due to climate change, lead to reduced cork oak growth and productivity. This work aimed to integrate different transcriptomics data available for this species to predict a gene co-expression network, and identify candidate regulatory modules of phellem development and assess their regulation by drought. The co-expression network was built using as guides a group of genes differentially expressed in phellem from plants exposed to drought conditions. Based on gene-to-gene co-expression links, transcription factor (TF)-target gene interactions were further predicted and reinforced using functional data available in the model plant A. thaliana. The generated network highlighted predominantly genes negatively regulated under drought, particularly gene modules related to cell division and differentiation (e.g. cell wall development). From the multiple interactions established by co-expression involving 27 TFs, 118 had been already identified in A. thaliana by experimental methods. Additionally, the specific binding sites predicted for 21 TFs were found in the promoters of 144 co-expressed genes. This demonstrated that the predicted co-expression network could, to some extent, predict candidate TF-target interactions. From the highlighted TFs, MYB93 and NAC43 were, respectively, integrated in two modules showing a concerted downregulation in phellem, in response to drought. Additionally, DREB1B TF and co-expressed targets are hypothesized as being involved in a adaptive response to drought by maintaining cellular homeostasis. Overall, the present work unveiled new regulatory gene modules of interest using stateof- the-art machine learning and data mining approaches, with predicted role in phellem development, and described their expression trend in response to drought. The selected candidate transcription factors will be further experimentally validated to reinforce the obtained in silico predictions. This will be an important contribution for the development of future strategies to screen cork oak plants for improved resilience and/or productivity, in response to adverse external conditions

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    INTEGRATIVE APPROACH TO PREDICT SIGNALLING PERTURBATIONS FOR CELLULAR TRANSITIONS: APPLICATION TO REGENERATIVE AND DISEASE MODELS

    Get PDF

    SINGLE CELL BASED COMPUTATIONAL APPROACHES TO UNRAVEL DYSREGULATIONS IN DISEASES

    Get PDF
    corecore