4,650 research outputs found

    Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets

    Get PDF
    Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation

    New components of the Dictyostelium PKA pathway revealed by Bayesian analysis of expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying candidate genes in genetic networks is important for understanding regulation and biological function. Large gene expression datasets contain relevant information about genetic networks, but mining the data is not a trivial task. Algorithms that infer Bayesian networks from expression data are powerful tools for learning complex genetic networks, since they can incorporate prior knowledge and uncover higher-order dependencies among genes. However, these algorithms are computationally demanding, so novel techniques that allow targeted exploration for discovering new members of known pathways are essential.</p> <p>Results</p> <p>Here we describe a Bayesian network approach that addresses a specific network within a large dataset to discover new components. Our algorithm draws individual genes from a large gene-expression repository, and ranks them as potential members of a known pathway. We apply this method to discover new components of the cAMP-dependent protein kinase (PKA) pathway, a central regulator of <it>Dictyostelium discoideum </it>development. The PKA network is well studied in <it>D. discoideum </it>but the transcriptional networks that regulate PKA activity and the transcriptional outcomes of PKA function are largely unknown. Most of the genes highly ranked by our method encode either known components of the PKA pathway or are good candidates. We tested 5 uncharacterized highly ranked genes by creating mutant strains and identified a candidate cAMP-response element-binding protein, yet undiscovered in <it>D. discoideum</it>, and a histidine kinase, a candidate upstream regulator of PKA activity.</p> <p>Conclusions</p> <p>The single-gene expansion method is useful in identifying new components of known pathways. The method takes advantage of the Bayesian framework to incorporate prior biological knowledge and discovers higher-order dependencies among genes while greatly reducing the computational resources required to process high-throughput datasets.</p

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Metabolite biosensors for cell factory development

    Get PDF
    Through synergy with natural sciences and engineering disciplines, biotechnology has\ua0become a broad, interdisciplinary, scientific field with many applications. One such\ua0application is the sustainable production of industrially relevant products using living\ua0systems such as microorganisms. Transforming microorganisms to cell factories is, however,\ua0a labour-intensive and cost-ineffective process, requiring many years of extensive\ua0research. Several fields together known as systems metabolic engineering, including\ua0synthetic biology, have greatly facilitated the process of customizing microorganisms\ua0to benefit human interests. Among several emerging tools are metabolite biosensors,\ua0which can be employed in high-throughput screening endeavours for identifying productive\ua0cells and in dynamic pathway regulation for optimizing metabolic systems.\ua0Developing and engineering metabolite biosensors to fit a certain application is, however,\ua0challenging.This thesis focuses on different aspects of utilizing and engineering metabolite-responsive\ua0transcription factor-based biosensors for facilitating the development of\ua0Saccharomyces cerevisiae as a cell factory. To that end, we improved the dynamic\ua0range of a malonyl-CoA-responsive biosensor by i) evaluating different binding site\ua0locations of the bacterial transcription factor FapR within different yeast promoters\ua0and by ii) using a chimeric transcription factor based on a native repressor system\ua0from S. cerevisiae. Furthermore, we suggest the possibility of using the CRISPR (Clustered\ua0Regulatory Interspaced Short Palindromic Repeats)/Cas9 system to facilitate\ua0biosensor development by guiding binding site positioning. We also employed an acyl-CoA-responsive biosensor based on the bacterial transcription factor FadR to screen for\ua0genes boosting the fatty acyl-CoA levels, which are precursors for industrially relevant\ua0compounds such as fatty alcohols. The possibility of developing fatty acid-responsive\ua0biosensors based on other transcription factors, including the endogenous transcription\ua0factor Mga2, has also been addressed. Finally, we looked into the potential of\ua0developing an alkane-responsive biosensor based on a system from Yarrowia lipolytica.\ua0Overall, this thesis provides answers, discussions and potential future directions on\ua0using and engineering metabolite biosensors for cell factory development
    • …
    corecore