3,632 research outputs found

    Data Mining for Gene Networks Relevant to Poor Prognosis in Lung Cancer Via Backward-Chaining Rule Induction

    Get PDF
    We use Backward Chaining Rule Induction (BCRI), a novel data mining method for hypothesizing causative mechanisms, to mine lung cancer gene expression array data for mechanisms that could impact survival. Initially, a supervised learning system is used to generate a prediction model in the form of “IF <conditions> THEN <outcome>” style rules. Next, each antecedent (i.e. an IF condition) of a previously discovered rule becomes the outcome class for subsequent application of supervised rule induction. This step is repeated until a termination condition is satisfied. “Chains” of rules are created by working backward from an initial condition (e.g. survival status). Through this iterative process of “backward chaining,” BCRI searches for rules that describe plausible gene interactions for subsequent validation. Thus, BCRI is a semi-supervised approach that constrains the search through the vast space of plausible causal mechanisms by using a top-level outcome to kick-start the process. We demonstrate the general BCRI task sequence, how to implement it, the validation process, and how BCRI-rules discovered from lung cancer microarray data can be combined with prior knowledge to generate hypotheses about functional genomics

    Personalized medicine support system : resolving conflict in allocation to risk groups and predicting patient molecular response to targeted therapy

    Get PDF
    Treatment management in cancer patients is largely based on the use of a standardized set of predictive and prognostic factors. The former are used to evaluate specific clinical interventions, and they can be useful for selecting treatments because they directly predict the response to a treatment. The latter are used to evaluate a patient’s overall outcomes, and can be used to identify the risks or recurrence of a disease. Current intelligent systems can be a solution for transferring advancements in molecular biology into practice, especially for predicting the molecular response to molecular targeted therapy and the prognosis of risk groups in cancer medicine. This framework primarily focuses on the importance of integrating domain knowledge in predictive and prognostic models for personalized treatment. Our personalized medicine support system provides the needed support in complex decisions and can be incorporated into a treatment guide for selecting molecular targeted therapies.Haneen Banjar, David Adelson, Fred Brown, and Tamara Leclerc

    Computational Hybrid Systems for Identifying Prognostic Gene Markers of Lung Cancer

    Get PDF
    Lung cancer is the most fatal cancer around the world. Current lung cancer prognosis and treatment is based on tumor stage population statistics and could not reliably assess the risk for developing recurrence in individual patients. Biomarkers enable treatment options to be tailored to individual patients based on their tumor molecular characteristics. To date, there is no clinically applied molecular prognostic model for lung cancer. Statistics and feature selection methods identify gene candidates by ranking the association between gene expression and disease outcome, but do not account for the interactions among genes. Computational network methods could model interactions, but have not been used for gene selection due to computational inefficiency. Moreover, the curse of dimensionality in human genome data imposes more computational challenges to these methods.;We proposed two hybrid systems for the identification of prognostic gene signatures for lung cancer using gene expressions measured with DNA microarray. The first hybrid system combined t-tests, Statistical Analysis of Microarray (SAM), and Relief feature selections in multiple gene filtering layers. This combinatorial system identified a 12-gene signature with better prognostic performance than published signatures in treatment selection for stage I and II patients (log-rank P\u3c0.04, Kaplan-Meier analyses). The 12-gene signature is a more significant prognostic factor (hazard ratio=4.19, 95% CI: [2.08, 8.46], P\u3c0.00006) than other clinical covariates. The signature genes were found to be involved in tumorigenesis in functional pathway analyses.;The second proposed system employed a novel computational network model, i.e., implication networks based on prediction logic. This network-based system utilizes gene coexpression networks and concurrent coregulation with signaling pathways for biomarker identification. The first application of the system modeled disease-mediated genome-wide coexpression networks. The entire genomic space were extensively explored and 21 gene signatures were discovered with better prognostic performance than all published signatures in stage I patients not receiving chemotherapy (hazard ratio\u3e1, CPE\u3e0.5, P \u3c 0.05). These signatures could potentially be used for selecting patients for adjuvant chemotherapy. The second application of the system modeled the smoking-mediated coexpression networks and identified a smoking-associated 7-gene signature. The 7-gene signature generated significant prognostication specific to smoking lung cancer patients (log-rank P\u3c0.05, Kaplan-Meier analyses), with implications in diagnostic screening of lung cancer risk in smokers (overall accuracy=74%, P\u3c0.006). The coexpression patterns derived from the implication networks in both applications were successfully validated with molecular interactions reported in the literature (FDR\u3c0.1).;Our studies demonstrated that hybrid systems with multiple gene selection layers outperform traditional methods. Moreover, implication networks could efficiently model genome-scale disease-mediated coexpression networks and crosstalk with signaling pathways, leading to the identification of clinically important gene signatures

    Genet-CNV: Boolean Implication Networks for Modeling Genome-Wide Co-occurrence of DNA Copy Number Variations

    Get PDF
    Lung cancer is the leading cause of cancer-related death in the world. Lung cancer can be categorized as non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC makes up about 80% to 85% of lung cancer cases diagnosed, whereas SCLC is responsible for 10% to 15% of the cases. It remains a challenge for physicians to identify patients who shall benefit from chemotherapy. In such a scenario, identifying genes that can facilitate therapeutic target discoveries and better understanding disease mechanisms and their regulation in different stages of lung cancer, remains an important topic of research. In this thesis, we develop a computational framework for modelling molecular gene interaction networks, called Genet-CNV, to analyse gene interactions based on DNA Copy Number Variations (CNV). DNA copy number variation is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population. These variations can be used to study the activity of genes in cancerous cells, compared with that of the normal population. Genet-CNV uses Boolean implication networks to investigate genome-wide DNA CNV to identify relationships called rules, that could potentially lead to the identification of genes of significant biological interest. Boolean implication networks are probabilistic graphical models that express the relationship between two variables terms of six implication rules that can describe if the genes are co-amplified, co-deleted or differentially amplified and deleted. Genet-CNV is run on three publicly available NSCLC genomic datasets. We further evaluate the results obtained with Genet-CNV by comparing them with the benchmark dataset, The Molecular Signatures Database (MSigDB). We identified several genes of interest that are present in survival, apoptosis, proliferation and immunologic pathways. The relationships obtained from this analysis can be tested for biological validations, or to confirm experimental results, thus facilitating the identification of genes playing a significant role in the causation and progress of NSCLC

    Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

    Get PDF
    Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

    Evolution-informed modeling improves outcome prediction for cancers

    Get PDF
    abstract: Despite wide applications of high-throughput biotechnologies in cancer research, many biomarkers discovered by exploring large-scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This problem is partly due to the overlooking of functional implications of molecular markers. Here, we present a novel computational method that uses evolutionary conservation as prior knowledge to discover bona fide biomarkers. Evolutionary selection at the molecular level is nature's test on functional consequences of genetic elements. By prioritizing genes that show significant statistical association and high functional impact, our new method reduces the chances of including spurious markers in the predictive model. When applied to predicting therapeutic responses for patients with acute myeloid leukemia and to predicting metastasis for patients with prostate cancers, the new method gave rise to evolution-informed models that enjoyed low complexity and high accuracy. The identified genetic markers also have significant implications in tumor progression and embrace potential drug targets. Because evolutionary conservation can be estimated as a gene-specific, position-specific, or allele-specific parameter on the nucleotide level and on the protein level, this new method can be extended to apply to miscellaneous “omics” data to accelerate biomarker discoveries.The final version of this article, as published in Evolutionary Applications, can be viewed online at: http://onlinelibrary.wiley.com/doi/10.1111/eva.12417/ful

    Evolution‐informed modeling improves outcome prediction for cancers

    Full text link
    Despite wide applications of high‐throughput biotechnologies in cancer research, many biomarkers discovered by exploring large‐scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This problem is partly due to the overlooking of functional implications of molecular markers. Here, we present a novel computational method that uses evolutionary conservation as prior knowledge to discover bona fide biomarkers. Evolutionary selection at the molecular level is nature’s test on functional consequences of genetic elements. By prioritizing genes that show significant statistical association and high functional impact, our new method reduces the chances of including spurious markers in the predictive model. When applied to predicting therapeutic responses for patients with acute myeloid leukemia and to predicting metastasis for patients with prostate cancers, the new method gave rise to evolution‐informed models that enjoyed low complexity and high accuracy. The identified genetic markers also have significant implications in tumor progression and embrace potential drug targets. Because evolutionary conservation can be estimated as a gene‐specific, position‐specific, or allele‐specific parameter on the nucleotide level and on the protein level, this new method can be extended to apply to miscellaneous “omics” data to accelerate biomarker discoveries.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135247/1/eva12417_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135247/2/eva12417.pd

    Portrait of Ependymoma Recurrence in Children: Biomarkers of Tumor Progression Identified by Dual-Color Microarray-Based Gene Expression Analysis

    Get PDF
    BACKGROUND: Children with ependymoma may experience a relapse in up to 50% of cases depending on the extent of resection. Key biological events associated with recurrence are unknown. METHODOLOGY/PRINCIPAL FINDINGS: To discover the biology behind the recurrence of ependymomas, we performed CGHarray and a dual-color gene expression microarray analysis of 17 tumors at diagnosis co-hybridized with the corresponding 27 first or subsequent relapses from the same patient. As treatment and location had only limited influence on specific gene expression changes at relapse, we established a common signature for relapse. Eighty-seven genes showed an absolute fold change ≄2 in at least 50% of relapses and were defined as the gene expression signature of ependymoma recurrence. The most frequently upregulated genes are involved in the kinetochore (ASPM, KIF11) or in neural development (CD133, Wnt and Notch pathways). Metallothionein (MT) genes were downregulated in up to 80% of the recurrences. Quantitative PCR for ASPM, KIF11 and MT3 plus immunohistochemistry for ASPM and MT3 confirmed the microarray results. Immunohistochemistry on an independent series of 24 tumor pairs at diagnosis and at relapse confirmed the decrease of MT3 expression at recurrence in 17/24 tumor pairs (p = 0.002). Conversely, ASPM expression was more frequently positive at relapse (87.5% vs 37.5%, p = 0.03). Loss or deletion of the MT genes cluster was never observed at relapse. Promoter sequencing after bisulfite treatment of DNA from primary tumors and recurrences as well as treatment of short-term ependymoma cells cultures with a demethylating agent showed that methylation was not involved in MT3 downregulation. However, in vitro treatment with a histone deacetylase inhibitor or zinc restored MT3 expression. CONCLUSIONS/SIGNIFICANCE: The most frequent molecular events associated with ependymoma recurrence were over-expression of kinetochore proteins and down-regulation of metallothioneins. Metallothionein-3 expression is epigenetically controlled and can be restored in vitro by histone deacetylase inhibitors

    Gene expression meta-analysis supports existence of molecular apocrine breast cancer with a role for androgen receptor and implies interactions with ErbB family

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pathway discovery from gene expression data can provide important insight into the relationship between signaling networks and cancer biology. Oncogenic signaling pathways are commonly inferred by comparison with signatures derived from cell lines. We use the Molecular Apocrine subtype of breast cancer to demonstrate our ability to infer pathways directly from patients' gene expression data with pattern analysis algorithms.</p> <p>Methods</p> <p>We combine data from two studies that propose the existence of the Molecular Apocrine phenotype. We use quantile normalization and XPN to minimize institutional bias in the data. We use hierarchical clustering, principal components analysis, and comparison of gene signatures derived from Significance Analysis of Microarrays to establish the existence of the Molecular Apocrine subtype and the equivalence of its molecular phenotype across both institutions. Statistical significance was computed using the Fasano & Franceschini test for separation of principal components and the hypergeometric probability formula for significance of overlap in gene signatures. We perform pathway analysis using LeFEminer and Backward Chaining Rule Induction to identify a signaling network that differentiates the subset. We identify a larger cohort of samples in the public domain, and use Gene Shaving and Robust Bayesian Network Analysis to detect pathways that interact with the defining signal.</p> <p>Results</p> <p>We demonstrate that the two separately introduced ER<sup>- </sup>breast cancer subsets represent the same tumor type, called Molecular Apocrine breast cancer. LeFEminer and Backward Chaining Rule Induction support a role for AR signaling as a pathway that differentiates this subset from others. Gene Shaving and Robust Bayesian Network Analysis detect interactions between the AR pathway, EGFR trafficking signals, and ErbB2.</p> <p>Conclusion</p> <p>We propose criteria for meta-analysis that are able to demonstrate statistical significance in establishing molecular equivalence of subsets across institutions. Data mining strategies used here provide an alternative method to comparison with cell lines for discovering seminal pathways and interactions between signaling networks. Analysis of Molecular Apocrine breast cancer implies that therapies targeting AR might be hampered if interactions with ErbB family members are not addressed.</p
    • 

    corecore