55 research outputs found

    Perspective on oncogenic processes at the end of the beginning of cancer genomics

    Get PDF
    The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing

    Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining

    Get PDF

    Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions

    Get PDF
    While germline copy-number variants (CNVs) contribute to schizophrenia (SCZ) risk, the contribution of somatic CNVs (sCNVs)—present in some but not all cells—remains unknown. We identified sCNVs using blood-derived genotype arrays from 12,834 SCZ cases and 11,648 controls, filtering sCNVs at loci recurrently mutated in clonal blood disorders. Likely early-developmental sCNVs were more common in cases (0.91%) than controls (0.51%, p = 2.68e−4), with recurrent somatic deletions of exons 1–5 of the NRXN1 gene in five SCZ cases. Hi-C maps revealed ectopic, allele-specific loops forming between a potential cryptic promoter and non-coding cis-regulatory elements upon 5′ deletions in NRXN1. We also observed recurrent intragenic deletions of ABCB11, encoding a transporter implicated in anti-psychotic response, in five treatment-resistant SCZ cases and showed that ABCB11 is specifically enriched in neurons forming mesocortical and mesolimbic dopaminergic projections. Our results indicate potential roles of sCNVs in SCZ risk

    A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

    No full text
    Motivation: Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results: The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand

    Temperature Accelerated Molecular Dynamics with Soft-Ratcheting Criterion Orients Enhanced Sampling by Low-Resolution Information

    No full text
    Many proteins exhibit an equilibrium between multiple conformations, some of them being characterized only by low-resolution information. Visiting all conformations is a demanding task for computational techniques performing enhanced but unfocused exploration of collective variable (CV) space. Otherwise, pulling a structure toward a target condition biases the exploration in a way difficult to assess. To address this problem, we introduce here the soft-ratcheting temperature-accelerated molecular dynamics (sr-TAMD), where the exploration of CV space by TAMD is coupled to a soft-ratcheting algorithm that filters the evolving CV values according to a predefined criterion. Any low resolution or even qualitative information can be used to orient the exploration. We validate this technique by exploring the conformational space of the inactive state of the catalytic domain of the adenyl cyclase AC from Bordetella pertussis. The domain AC gets activated by association with calmodulin (CaM), and the available crystal structure shows that in the complex the protein has an elongated shape. High-resolution data are not available for the inactive, CaM-free protein state, but hydrodynamic measurements have shown that the inactive AC displays a more globular conformation. Here, using as CVs several geometric centers, we use sr-TAMD to enhance CV space sampling while filtering for CV values that correspond to centers moving close to each other, and we thus rapidly visit regions of conformational space that correspond to globular structures. The set of conformations sampled using sr-TAMD provides the most extensive description of the inactive state of AC up to now, consistent with available experimental information
    • …
    corecore