37,380 research outputs found
Integrating multiple types of data to predict novel cell cycle-related genes
<p>Abstract</p> <p>Background</p> <p>Cellular functions depend on genetic, physical and other types of interactions. As such, derived interaction networks can be utilized to discover novel genes involved in specific biological processes. Epistatic Miniarray Profile, or E-MAP, which is an experimental platform that measures genetic interactions on a genome-wide scale, has successfully recovered known pathways and revealed novel protein complexes in <it>Saccharomyces cerevisiae</it> (budding yeast).</p> <p>Results</p> <p>By combining E-MAP data with co-expression data, we first predicted a potential cell cycle related gene set. Using Gene Ontology (GO) function annotation as a benchmark, we demonstrated that the prediction by combining microarray and E-MAP data is generally >50% more accurate in identifying co-functional gene pairs than the prediction using either data source alone. We also used transcription factor (TF)âDNA binding data (Chip-chip) and protein phosphorylation data to construct a local cell cycle regulation network based on potential cell cycle related gene set we predicted. Finally, based on the E-MAP screening with 48 cell cycle genes crossing 1536 library strains, we predicted four unknown genes (<it>YPL158C</it>, <it>YPR174C</it>, <it>YJR054W</it>, and <it>YPR045C</it>) as potential cell cycle genes, and analyzed them in detail.</p> <p>Conclusion</p> <p>By integrating E-MAP and DNA microarray data, potential cell cycle-related genes were detected in budding yeast. This integrative method significantly improves the reliability of identifying co-functional gene pairs. In addition, the reconstructed network sheds light on both the function of known and predicted genes in the cell cycle process. Finally, our strategy can be applied to other biological processes and species, given the availability of relevant data.</p
Modeling cancer metabolism on a genome scale
Cancer cells have fundamentally altered cellular metabolism that is associated with their tumorigenicity and malignancy. In addition to the widely studied Warburg effect, several new key metabolic alterations in cancer have been established over the last decade, leading to the recognition that altered tumor metabolism is one of the hallmarks of cancer. Deciphering the full scope and functional implications of the dysregulated metabolism in cancer requires both the advancement of a variety of omics measurements and the advancement of computational approaches for the analysis and contextualization of the accumulated data. Encouragingly, while the metabolic network is highly interconnected and complex, it is at the same time probably the best characterized cellular network. Following, this review discusses the challenges that genomeâscale modeling of cancer metabolism has been facing. We survey several recent studies demonstrating the first strides that have been done, testifying to the value of this approach in portraying a networkâlevel view of the cancer metabolism and in identifying novel drug targets and biomarkers. Finally, we outline a few new steps that may further advance this field
Predicting drug response of tumors from integrated genomic profiles by deep neural networks
The study of high-throughput genomic profiles from a pharmacogenomics
viewpoint has provided unprecedented insights into the oncogenic features
modulating drug response. A recent screening of ~1,000 cancer cell lines to a
collection of anti-cancer drugs illuminated the link between genotypes and
vulnerability. However, due to essential differences between cell lines and
tumors, the translation into predicting drug response in tumors remains
challenging. Here we proposed a DNN model to predict drug response based on
mutation and expression profiles of a cancer cell or a tumor. The model
contains a mutation and an expression encoders pre-trained using a large
pan-cancer dataset to abstract core representations of high-dimension data,
followed by a drug response predictor network. Given a pair of mutation and
expression profiles, the model predicts IC50 values of 265 drugs. We trained
and tested the model on a dataset of 622 cancer cell lines and achieved an
overall prediction performance of mean squared error at 1.96 (log-scale IC50
values). The performance was superior in prediction error or stability than two
classical methods and four analog DNNs of our model. We then applied the model
to predict drug response of 9,059 tumors of 33 cancer types. The model
predicted both known, including EGFR inhibitors in non-small cell lung cancer
and tamoxifen in ER+ breast cancer, and novel drug targets. The comprehensive
analysis further revealed the molecular mechanisms underlying the resistance to
a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer
potential of a novel agent, CX-5461, in treating gliomas and hematopoietic
malignancies. Overall, our model and findings improve the prediction of drug
response and the identification of novel therapeutic options.Comment: Accepted for presentation in the International Conference on
Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA.
Currently under consideration for publication in a Supplement Issue of BMC
Genomic
Bayesian correlated clustering to integrate multiple datasets
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct â but often complementary â information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets.
Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDIâs performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques â as well as to non-integrative approaches â demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods
Network-based approaches to explore complex biological systems towards network medicine
Network medicine relies on different types of networks: from the molecular level of proteinâprotein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of proteinâprotein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAsâincluding long non-coding RNAs (lncRNAs) âcompeting with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genesâcalled switch genesâcritically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
Automated data integration for developmental biological research
In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research
Multiscale metabolic modeling of C4 plants: connecting nonlinear genome-scale models to leaf-scale metabolism in developing maize leaves
C4 plants, such as maize, concentrate carbon dioxide in a specialized
compartment surrounding the veins of their leaves to improve the efficiency of
carbon dioxide assimilation. Nonlinear relationships between carbon dioxide and
oxygen levels and reaction rates are key to their physiology but cannot be
handled with standard techniques of constraint-based metabolic modeling. We
demonstrate that incorporating these relationships as constraints on reaction
rates and solving the resulting nonlinear optimization problem yields realistic
predictions of the response of C4 systems to environmental and biochemical
perturbations. Using a new genome-scale reconstruction of maize metabolism, we
build an 18000-reaction, nonlinearly constrained model describing mesophyll and
bundle sheath cells in 15 segments of the developing maize leaf, interacting
via metabolite exchange, and use RNA-seq and enzyme activity measurements to
predict spatial variation in metabolic state by a novel method that optimizes
correlation between fluxes and expression data. Though such correlations are
known to be weak in general, here the predicted fluxes achieve high correlation
with the data, successfully capture the experimentally observed base-to-tip
transition between carbon-importing tissue and carbon-exporting tissue, and
include a nonzero growth rate, in contrast to prior results from similar
methods in other systems. We suggest that developmental gradients may be
particularly suited to the inference of metabolic fluxes from expression data.Comment: 57 pages, 14 figures; submitted to PLoS Computational Biology; source
code available at http://github.com/ebogart/fluxtools and
http://github.com/ebogart/multiscale_c4_sourc
- âŠ