302 research outputs found
Recommended from our members
Bayesian Inference for Genomic Data Analysis
High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data.
Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters.
Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples
Recommended from our members
Systems biology approaches to precision medicine
This dissertation reviews the development and implementation of two systems biology meth- ods: ADVOCATE and hpARACNE. ADVOCATE was designed to deconvolve epithelium and stroma compartments fractions and virtual expression profiles from bulk gene expression profiles from human patients. We used laser capture microdissection and RNA sequencing to disentangle the transcriptional programs active in the malignant epithelium and stroma of pancreatic ductal adenocarcinoma (PDA), an aggressive malignancy with a prominent stromal component. We learned that distinct molecular subtypes are present in both the epithelium and the stroma of pancreatic cancer, and that the subtype identity of these two compartments are independent of one another. Critically, we discovered that specific com- binations of epithelial and stromal subtypes are strongly associated with patient survival across multiple external datasets, exhibiting both an effect-size and a level of reproducibility that was absent from previous efforts. These analyses were made possible by a new proba- bilistic algorithm (Adaptive DeconVolution Of CAncer Tissue Expression - ADVOCATE) that can extract compartment-specific gene expression profiles from bulk gene expression data. ADVOCATE accurately predicted the compartment fractions of bulk tumor samples and improved the performance of molecular classifiers by controlling for the diverse cellular compositions of independent datasets. This approach provides a much-needed framework to handle solid tumor tissue heterogeneity, allowing integrated analysis of both epithelial and stromal transcriptional programs from individual bulk samples.
Reverse engineering approaches have been used to systematically dissect regulatory in- teractions based on gene expression profiles in different context and data types, thus im- proving our mechanistic understanding of molecular programs under perturbations. Pro- teomics data, on the other hand, provides direct evidence of cell functions. Particularly,
signaling molecules are best candidates for drug targets. Previous efforts have shown that targeting signaling proteins could potentially lead to cancer remission. In this work, I introduce hybrid proteomics Algorithm for the Reconstruction of Accurate Cellular Network (hpARACNE), a re-design of gene expression based ARACNE algorithm. Us- ing Clinical Proteomics Tumor Analysis Consortium (CPTAC) breast cancer proteomics data, hpARACNE reconstructs a network that significantly outperforms ARACNE when compared with curated Kinase/Phosphatase-substrates interactions from public databases. Compared with Stable Isotope Labeling with Amino acid in Cell Culture (SILAC) ex- perimentally identified substrates for EGFR, hpARACNE predicts substrates with high accuracy. Integrative network analysis of breast cancer transcriptome and phosphopro- teome reveals potential drug targets for Triple Negative Breast Cancer (TNBC) treat- ment. hpARACNE has three innovations that adapt it to proteomics data and signaling process: 1) Refinement of the kinase/phosphatase peptides by integrating matched whole proteomic and whole phosphoproteomic profiles; 2) Establishment of association based on newly designed Mutual Information (MI) estimator for missing data; 3) Network pruning using directional Data Processing Inequality (dDPI) for signalling process
Traveling Salesman Problem
This book is a collection of current research in the application of evolutionary algorithms and other optimal algorithms to solving the TSP problem. It brings together researchers with applications in Artificial Immune Systems, Genetic Algorithms, Neural Networks and Differential Evolution Algorithm. Hybrid systems, like Fuzzy Maps, Chaotic Maps and Parallelized TSP are also presented. Most importantly, this book presents both theoretical as well as practical applications of TSP, which will be a vital tool for researchers and graduate entry students in the field of applied Mathematics, Computing Science and Engineering
SCALABLE MODELING APPROACHES IN SYSTEMS IMMUNOLOGY
Systems biology seeks to build quantitative predictive models of biological system behavior. Biological systems, such as the mammalian immune system, operate across multiple spatiotemporal scales with a myriad of molecular and cellular players. Thus, mechanistic, predictive models describing such systems need to address this multiscale nature. A general outstanding problem is to cope with the high-dimensional parameter space arising when building reasonably detailed models. Another challenge is to devise integrated frameworks incorporating behavioral characteristics manifested at various organizational levels seamlessly. In this dissertation, I present two research projects addressing problems in immunological, or biological systems in general, using quantitative mechanistic models and machine learning, touching on the aforementioned challenges in scalable modeling.
First, I aimed to understand how cell-to-cell heterogeneities are regulated through gene expression variations and their propagation at the single-cell level. To better understand detailed gene regulatory circuit models with many parameters without analytical solutions, I developed a framework called MAchine learning of Parameter-Phenotype Analysis (MAPPA). MAPPA combines machine learning approaches and stochastic simulation methods to dissect the mapping between high- dimensional parameters and phenotypes. MAPPA elucidated regulatory features of stochastic gene-gene correlation phenotypes.
Next, I sought to quantitatively dissect immune homeostasis conferring tolerance to self-antigens and responsiveness to foreign antigens. Towards this goal, I built a series of models spanning from intracellular to organismal levels to describe the recurrent reciprocal relationships between self-reactive T cells and regulatory T cells in collaboration with an experimentalist. This effort elucidated critical immune parameters regulating the circuitry enabling the robust suppression of self-reactive T cells, followed by experimental validation. Moreover, by bridging these models across organizational scales, I derived a framework describing immune homeostasis as a dynamical equilibrium between self-activated T cells and regulatory T cells, typically operating well below thresholds that could result in clonal expansion and subsequent autoimmune diseases.
I start with an introduction with a perspective linking seemingly contradictory behaviors of the immune system at different scales: microscopic “noise” and macroscopic deterministic outcomes. By connecting these aspects in the adaptive immune system analogously with an ansatz from statistical physics, I introduced a view on how robust immune homeostasis ensues
Living with noise: The evolution of gene expression noise in gene regulatory networks
One of the keystones of evolutionary biology is the study of how organismal traits change in time. Technological advancements in the past twenty years have enabled us to study the variation of an important trait, gene expression level, at single cell resolution. One of the sources of gene expression level variation is gene expression noise, a result of the innate stochasticity of the gene expression process. Gene expression noise is gene-specific and can be tuned by selection, but what drives the evolution of gene-specific expression noise remains an open question. In this thesis, I explore the selective pressure and evolvability of gene-specific expression noise in gene regulatory networks. I use evolutionary simulations by applying rounds of mutation, recombination and reproduction to populations of model gene regulatory networks in different selection scenarios. In the first chapter, I investigate the response of gene-specific expression noise in gene regulatory networks in constant environments, which imposes stabilizing selection on gene expression level. The probability of responding to selection and the strength of the selective response was affected by local network centrality metrics. Furthermore, global network features, such as network diameter, centralization and average degree affected the average expression variance and average selective pressure acting on constituent genes. In the second chapter, I investigate the response of mean gene expression level and gene-specific expression noise in isolated genes and genes in gene regulatory networks in changing environments. Gene-specific expression noise of genes increased under fluctuating selection, indicating the evolution of a bet-hedging strategy. Under directional selection gene-specific expression noise transiently increased, showing that expression noise plays a role in the adaptation process towards a new mean expression optimum
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
Mass spectral imaging of clinical samples using deep learning
A better interpretation of tumour heterogeneity and variability is vital for the improvement of novel diagnostic techniques and personalized cancer treatments. Tumour tissue heterogeneity is characterized by biochemical heterogeneity, which can be investigated by unsupervised metabolomics.
Mass Spectrometry Imaging (MSI) combined with Machine Learning techniques have generated increasing interest as analytical and diagnostic tools for the analysis of spatial molecular patterns in tissue samples. Considering the high complexity of data produced by the application of MSI, which can consist of many thousands of spectral peaks, statistical analysis and in particular machine learning and deep learning have been investigated as novel approaches to deduce the relationships between the measured molecular patterns and the local structural and biological properties of the tissues.
Machine learning have historically been divided into two main categories: Supervised and Unsupervised learning. In MSI, supervised learning methods may be used to segment tissues into histologically relevant areas e.g. the classification of tissue regions in H&E (Haemotoxylin and Eosin) stained samples. Initial classification by an expert histopathologist, through visual inspection enables the development of univariate or multivariate models, based on tissue regions that have significantly up/down-regulated ions. However, complex data may result in underdetermined models, and alternative methods that can cope with high dimensionality and noisy data are required.
Here, we describe, apply, and test a novel diagnostic procedure built using a combination of MSI and deep learning with the objective of delineating and identifying biochemical differences between cancerous and non-cancerous tissue in metastatic liver cancer and epithelial ovarian cancer. The workflow investigates the robustness of single (1D) to multidimensional (3D) tumour analyses and also highlights possible biomarkers which are not accessible from classical visual analysis of the H&E images. The identification of key molecular markers may provide a deeper understanding of tumour heterogeneity and potential targets for intervention.Open Acces
- …