69 research outputs found

    Characteristics of predictor sets found using differential prioritization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Feature selection plays an undeniably important role in classification problems involving high dimensional datasets such as microarray datasets. For filter-based feature selection, two well-known criteria used in forming predictor sets are relevance and redundancy. However, there is a third criterion which is at least as important as the other two in affecting the efficacy of the resulting predictor sets. This criterion is the degree of differential prioritization (DDP), which varies the emphases on relevance and redundancy depending on the value of the DDP. Previous empirical works on publicly available microarray datasets have confirmed the effectiveness of the DDP in molecular classification. We now propose to establish the fundamental strengths and merits of the DDP-based feature selection technique. This is to be done through a simulation study which involves vigorous analyses of the characteristics of predictor sets found using different values of the DDP from toy datasets designed to mimic real-life microarray datasets.</p> <p>Results</p> <p>A simulation study employing analytical measures such as the distance between classes before and after transformation using principal component analysis is implemented on toy datasets. From these analyses, the necessity of adjusting the differential prioritization based on the dataset of interest is established. This conclusion is supported by comparisons against both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods, which demonstrates the superiority of the DDP-based feature selection technique. Reapplying similar analyses to real-life multiclass microarray datasets provides further confirmation of our findings and of the significance of the DDP for practical applications.</p> <p>Conclusion</p> <p>The findings have been achieved based on analytical evaluations, not empirical evaluation involving classifiers, thus providing further basis for the usefulness of the DDP and validating the need for unequal priorities on relevance and redundancy during feature selection for microarray datasets, especially highly multiclass datasets.</p

    DGW: an exploratory data analysis tool for clustering and visualisation of epigenomic marks

    Get PDF
    Background Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. Results We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. Conclusions Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package

    Signatures of Environmental Genetic Adaptation Pinpoint Pathogens as the Main Selective Pressure through Human Evolution

    Get PDF
    Previous genome-wide scans of positive natural selection in humans have identified a number of non-neutrally evolving genes that play important roles in skin pigmentation, metabolism, or immune function. Recent studies have also shown that a genome-wide pattern of local adaptation can be detected by identifying correlations between patterns of allele frequencies and environmental variables. Despite these observations, the degree to which natural selection is primarily driven by adaptation to local environments, and the role of pathogens or other ecological factors as selective agents, is still under debate. To address this issue, we correlated the spatial allele frequency distribution of a large sample of SNPs from 55 distinct human populations to a set of environmental factors that describe local geographical features such as climate, diet regimes, and pathogen loads. In concordance with previous studies, we detected a significant enrichment of genic SNPs, and particularly non-synonymous SNPs associated with local adaptation. Furthermore, we show that the diversity of the local pathogenic environment is the predominant driver of local adaptation, and that climate, at least as measured here, only plays a relatively minor role. While background demography by far makes the strongest contribution in explaining the genetic variance among populations, we detected about 100 genes which show an unexpectedly strong correlation between allele frequencies and pathogenic environment, after correcting for demography. Conversely, for diet regimes and climatic conditions, no genes show a similar correlation between the environmental factor and allele frequencies. This result is validated using low-coverage sequencing data for multiple populations. Among the loci targeted by pathogen-driven selection, we found an enrichment of genes associated to autoimmune diseases, such as celiac disease, type 1 diabetes, and multiples sclerosis, which lends credence to the hypothesis that some susceptibility alleles for autoimmune diseases may be maintained in human population due to past selective processes

    Precision medicine driven by cancer systems biology

    Get PDF
    Molecular insights from genome and systems biology are influencing how cancer is diagnosed and treated. We critically evaluate big data challenges in precision medicine. The melanoma research community has identified distinct subtypes involving chronic sun-induced damage and the mitogen-activated protein kinase driver pathway. In addition, despite low mutation burden, non-genomic mitogen-activated protein kinase melanoma drivers are found in membrane receptors, metabolism, or epigenetic signaling with the ability to bypass central mitogen-activated protein kinase molecules and activating a similar program of mitogenic effectors. Mutation hotspots, structural modeling, UV signature, and genomic as well as non-genomic mechanisms of disease initiation and progression are taken into consideration to identify resistance mutations and novel drug targets. A comprehensive precision medicine profile of a malignant melanoma patient illustrates future rational drug targeting strategies. Network analysis emphasizes an important role of epigenetic and metabolic master regulators in oncogenesis. Co-occurrence of driver mutations in signaling, metabolic, and epigenetic factors highlights how cumulative alterations of our genomes and epigenomes progressively lead to uncontrolled cell proliferation. Precision insights have the ability to identify independent molecular pathways suitable for drug targeting. Synergistic treatment combinations of orthogonal modalities including immunotherapy, mitogen-activated protein kinase inhibitors, epigenetic inhibitors, and metabolic inhibitors have the potential to overcome immune evasion, side effects, and drug resistance

    Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma

    Get PDF
    SummaryWe describe a comprehensive genomic characterization of adrenocortical carcinoma (ACC). Using this dataset, we expand the catalogue of known ACC driver genes to include PRKAR1A, RPL22, TERF2, CCNE1, and NF1. Genome wide DNA copy-number analysis revealed frequent occurrence of massive DNA loss followed by whole-genome doubling (WGD), which was associated with aggressive clinical course, suggesting WGD is a hallmark of disease progression. Corroborating this hypothesis were increased TERT expression, decreased telomere length, and activation of cell-cycle programs. Integrated subtype analysis identified three ACC subtypes with distinct clinical outcome and molecular alterations which could be captured by a 68-CpG probe DNA-methylation signature, proposing a strategy for clinical stratification of patients based on molecular markers

    Comprehensive functional annotation of susceptibility variants identifies genetic heterogeneity between lung adenocarcinoma and squamous cell carcinoma

    Get PDF
    Although genome-wide association studies have identified more than eighty genetic variants associated with non-small cell lung cancer (NSCLC) risk, biological mechanisms of these variants remain largely unknown. By integrating a large-scale genotype data of 15 581 lung adenocarcinoma (AD) cases, 8350 squamous cell carcinoma (SqCC) cases, and 27 355 controls, as well as multiple transcriptome and epigenomic databases, we conducted histology-specific meta-analyses and functional annotations of both reported and novel susceptibility variants. We identified 3064 credible risk variants for NSCLC, which were overrepresented in enhancer-like and promoter-like histone modification peaks as well as DNase I hypersensitive sites. Transcription factor enrichment analysis revealed that USF1 was AD-specific while CREB1 was SqCC-specific. Functional annotation and gene-based analysis implicated 894 target genes, including 274 specifics for AD and 123 for SqCC, which were overrepresented in somatic driver genes (ER = 1.95, P = 0.005). Pathway enrichment analysis and Gene-Set Enrichment Analysis revealed that AD genes were primarily involved in immune-related pathways, while SqCC genes were homologous recombination deficiency related. Our results illustrate the molecular basis of both well-studied and new susceptibility loci of NSCLC, providing not only novel insights into the genetic heterogeneity between AD and SqCC but also a set of plausible gene targets for post-GWAS functional experiments
    corecore