3 research outputs found

    A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data

    Get PDF
    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes

    Genome-wide association study identifies variation at 6q25.1 associated with survival in multiple myeloma

    Get PDF
    Survival following a diagnosis of multiple myeloma (MM) varies between patients and some of these differences may be a consequence of inherited genetic variation. In this study, to identify genetic markers associated with MM overall survival (MM-OS), we conduct a meta-analysis of four patient series of European ancestry, totalling 3,256 patients with 1,200 MM-associated deaths. Each series is genotyped for ∼600,000 single nucleotide polymorphisms across the genome; genotypes for six million common variants are imputed using 1000 Genomes Project and UK10K as the reference. The association between genotype and OS is assessed by Cox proportional hazards model adjusting for age, sex, International staging system and treatment. We identify a locus at 6q25.1 marked by rs12374648 associated with MM-OS (hazard ratio=1.34, 95% confidence interval=1.22-1.48, P=4.69 × 10 -9). Our findings have potential clinical implications since they demonstrate that inherited genotypes can provide prognostic information in addition to conventional tumor acquired prognostic factors

    Identifying Regulators from Multiple Types of Biological Data in Cancer

    Get PDF
    Cancer genomes accumulate alterations that promote cancer cell proliferation and survival. Structural, genetic and epigenetic alterations that have a selective advantage for tumorigenesis affect key regulatory genes and microRNAs that in turn regulate the expression of many target genes. The goal of this dissertation is to leverage the alteration-rich landscape of cancer genomes to detect key regulatory genes and microRNAs. To this end, we designed a feature selection algorithm to identify DNA methylation signals around a gene that would highly predict its expression. We found that genes whose expression could be predicted by DNA methylation accurately were enriched in Gene Ontology terms related to the regulation of various biological processes. This suggests that genes controlled by DNA methylation are regulatory genes. We also developed two tools that infer relationships between regulatory genes and target genes leveraging structural and epigenetic data. The first tool, ProcessDriver integrates copy number alteration and gene expression datasets to identify copy number cancer driver genes, target genes of these drivers and the disrupted biological processes. Our results showed that driver genes selected by ProcessDriver are enriched in known cancer genes. Using survival analysis, we showed that drivers are linked to new tumor events after initial treatment. The second tool was developed to leverage structural and epigenetic data to infer interactions between regulatory genes and targets on a network-level. Our canonical correlation analysis-based approach utilized the DNA methylation or copy number states of potential regulators and the expression states of potential targets to score regulatory interactions. We then incorporated these regulatory interaction scores as prior knowledge in a dynamic Bayesian framework utilizing time series gene expression data. Our results indicated that the canonical correlation analysis-based scores reflect the true interactions between genes with high accuracy, and the accuracy can be further increased by using the scores as a prior in the dynamic Bayesian framework. Finally, we are developing an algorithm to detect cancer-related microRNAs, associated targets and disrupted biological processes. Our preliminary results suggest that the modules of miRNAs and target genes identified in this approach are enriched in known microRNA-gene interactions
    corecore