12 research outputs found
Recommended from our members
Genome-wide Predictive Simulation on the Effect of Perturbation and the Cause of Phenotypic variations with Network Biology Approach
Thanks to modern high-throughput technologies such as microarray-based gene expression profiling, a large amount of molecular profile data have been generated in several disease related contexts. Despite the fact that these data likely contain systems-level information about disease regulation, revealing the underlying dynamics between genes and mechanisms of gene regulation in genome wide way remains a major challenge. Understanding these mechanisms in genome-wide fashion and the resulting dynamical behavior is a key goal of the nascent field of systems biology. One approach to dissect the logic of the cell, is to use reverse engineering algorithms that infer regulatory interactions form molecular profile data. In this context, use of information theoretic approaches has been very successful: for instance, the ARACNe algorithm has been able to successfully infer transcriptional interactions between transcription factors and their target genes; similarly, the MINDy algorithm has identified post-translational modulators of transcription factor activity by multivariate analysis of large gene expression profile datasets. Many methods have been proposed to improve ARACNe both from a computational efficiency perspective and in terms of increasing the accuracy of the predicted interactions. Yet, the main core of ARACNe, i.e., the data processing inequality (DPI), has remained virtually unaffected even though modern information theory has extended the DPI theorem into higher-order interactions. First, we introduce an improvement of ARACNe, hARACNe, which recursively applies a higher-order DPI analysis. We show that the new algorithm successfully detects false positive feed-forward loops involving more than three genes. Second, we extend the MINDy algorithm using co-information as a novel metric, thus replacing the conditional mutual information and significantly improving the algorithm"â˘s predictions. Largely, two ultimate goals of systems perturbation studies are to reveal how human diseases are connected with the genes, and to find regulatory mechanism that determine disease cell behavior. However, these goals remain daunting: even the most talented researchers still have to rely on laborious genetic screens and very simplified hypotheses about effects of given perturbation have been experimentally validated and roughly analyzed with very limited regulatory sub-network such as pathway. To overcome these limitations, use of gene regulatory network is explored in this thesis research. Specifically, we propose creation of a new algorithm that can accurately predict cell state in genome-wide fashion following perturbation of individual genes, such as from silencing or ectopic expression experiments. Furthermore, experimentally validated methods to predict genome-wide changes in a cellular system following a genetic perturbation (e.g., gene silencing or ectopic expression) are still unavailable, and even though phenotypic variations are experimentally profiled and gene signatures are selected by being statistically tested, finding the exact regulator which systematically causes significant variations of gene signature is still quite challenging. In this research, I introduce and experimentally validate a probabilistic Bayesian method to simulate the propagation of genetic perturbations on integrated gene regulatory networks inferred by the hARACNe and coMINDy algorithms from human B cell data. With the same predictive framework, we also computationally predict the master driver (regulator) that is most likely to have produced the observed variations in gene expression levels; these studies as a systematized pre-screening process before genetic manipulation. I predict in silico the effect of silencing of several genes as well as the cause of phenotypic variations. Performance analysis, tested by Gene Set Enrichment Analysis (GSEA), shows that the new methods are highly predictive, thus providing an initial step toward building predictive probabilistic regulatory models, which may be applicable as pre-screening steps in perturbation studies
Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models
Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells
Molecular Beam Epitaxy of Highly Crystalline Monolayer Molybdenum Disulfide on Hexagonal Boron Nitride
Atomically thin molybdenum disulfide (MoS2), a direct-band-gap semiconductor, is promising for applications in electronics and optoelectronics, but the scalable synthesis of highly crystalline film remains challenging. Here we report the successful epitaxial growth of a continuous, uniform, highly crystalline monolayer MoS2 film on hexagonal boron nitride (h-BN) by molecular beam epitaxy. Atomic force microscopy and electron microscopy studies reveal that MoS2 grown on h-BN primarily consists of two types of nucleation grains (0?? aligned and 60?? antialigned domains). By adopting a high growth temperature and ultralow precursor flux, the formation of 60?? antialigned grains is largely suppressed. The resulting perfectly aligned grains merge seamlessly into a highly crystalline film. Large-scale monolayer MoS2 film can be grown on a 2 in. h-BN/sapphire wafer, for which surface morphology and Raman mapping confirm good spatial uniformity. Our study represents a significant step in the scalable synthesis of highly crystalline MoS2 films on atomically flat surfaces and paves the way to large-scale applications
Gene expression subclass analysis.
<p>(A) Comparison of hierarchical clustering of METABRIC data (left panel) and Perou data (right panel). Hierarchical clustering on the gene expression data of the PAM50 genes in both datasets reveals a similar gene expression pattern that separates into several subclasses. Although several classes are apparent, they are consistent with sample assignment into basal-like, Her2-enriched and luminal subclasses in the Perou data. Similarly, in the METABRIC data the subclasses are consistent with the available clinical data for triple-negative, ER and PR status, and HER2 positive. (B) Kaplan-Meier plot for subclasses. The METABRIC test dataset was separated into 3 major subclasses according to clinical features. The subclasses were determined by the clinical features: triple negative (red); ER or PR positive status (blue); and HER2 positive with ER and PR negative status (green). The survival curve was estimated using a standard Kaplan-Meier curve, and shows the expected differences in overall survival between the subclasses. (C,D) Kaplan-Meier curve by grade and histology. The test dataset was separated by tumor grade (subplot C; grade 1 â red, grade 2 â green, grade 3- blue), or by histology (subplot D; Infilitrating Lobular â red, Infiltrating Ductal â yellow, Medullary âgreen, Mixed Histology â blue, or Mucinous - purple). The survival curves were estimated using a standard Kaplan-Meier curve, and show the expected differences in overall survival for the clinical features.</p
Distribution of concordance index scores of models submitted in the pilot competition.
<p>(A) Models are categorized by the type of features they use. Boxes indicate the 25<sup>th</sup> (lower end), 50<sup>th</sup> (middle red line) and 75<sup>th</sup> (upper end) of the scores in each category, while the whiskers indicate the 10<sup>th</sup> and 90<sup>th</sup> percentiles of the scores. The scores for the baseline and best performer are highlighted. (B) Model performance by submission date. In the initial phase of the competition, slight improvements over the baseline model were achieved by applying machine learning approaches to only the clinical data (red circles), whereas initial attempts to incorporate molecular data significantly decreased performance (green, purple, and black circles). In the intermediate phase of the competition, models combining molecular and clinical data (green circles) predominated and achieved slightly improved performance over clinical only models. Towards the end of the competition, models combining clinical information with molecular features selected based on prior information (purple circles) predominated.</p
Model performance by feature set and learning algorithm.
<p>(A) The concordance index is displayed for each model from the controlled experiment (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003047#pcbi.1003047.s005" target="_blank">Table S4</a>). The methods and features sets are arranged according to the mean concordance index score. The ensemble method (cyan curve) infers survival predictions based on the average rank of samples from each of the four other learning algorithms, and the ensemble feature set uses the average rank of samples based on models trained using all of the other feature sets. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003047#s2" target="_blank">Results</a> for the METABRIC2 and MicMa datasets are show in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003047#pcbi.1003047.s001" target="_blank">Figure S1</a>. (B) The concordance index of models from the controlled phase by type. The ensemble method again utilizes the average rank for models in each category.</p
Consistency of results in 2 additional datasets.
<p>(A,C) Concordance index scores for all models evaluated in the controlled experiment. Scores from the original evaluation are compared against METABRIC2 (A) and MicMa (C). The 4 machine learning algorithms are displayed in different colors. (B,D) Individual plots for each machine learning algorithm.</p