13 research outputs found

    Ask the people: developing guidelines for genomic research with Aboriginal and Torres Strait Islander peoples

    Full text link
    In health and medical research, guidelines are a set of statements and recommendations, whereby experts or stakeholders assess published literature to generate practical advice for a specific audience. This emphasis on guidelines development with expert consultation and published literature is not practical or inclusive when working in disciplines with minimal data and addressing issues that concern under-represented communities. Here we describe the process used for developing guidelines for the conduct of genomic research projects in partnership with Aboriginal and Torres Strait Islander peoples. A new technology with individual and community level ethical and social implications, and First Nations peoples with cultural and community expectations for research. We developed the guidelines through a consultation process that used participatory action research to engage with various stakeholders during multiple rounds of tailored activities. The end product, ‘Genomic Partnerships: Guidelines for Genomics Research with Aboriginal and Torres Strait Islander peoples of Queensland’ reflects the needs of the end-users and perspectives of the Aboriginal and Torres Strait Islander peoples, communities and organisations that participated. Through this process, we have identified recommendations for developing guidelines with other under-represented communities.</p

    Abstract A051: Multi-omic explainable machine learning improves cancer treatment outcome prediction

    Full text link
    Abstract Background: Advancements in multi-omics data integration and explainable Machine Learning (ML) have shown promise in precision oncology. Multi-omic data used to train ML models may include genomics, transcriptomics and histopathology to characterize cancer cells and the tumor microenvironment (TME). Explainability methods, such as SHAP, have enabled researchers and clinicians to unravel the decision-making rationale of ML models predicting cancer progression and treatment response. We developed an explainable ML framework that incorporates multi-omic features of cancer and the TME. This framework was applied to predict patient response to neoadjuvant chemotherapy (NAC) in breast cancer and immune checkpoint inhibitor (ICI) in melanoma. Methods: For breast cancer, we used the cohort from Sammut et al. [1] (n=157 training, n=75 test). For melanoma, we assembled a cohort comprising 229 patients (n=138 training, n=53 test cutaneous, n=38 test non-cutaneous) from five independent studies. We improved the performance of the ensemble ML models in Sammut et al. [1] by implementing a shared-learning architecture to enable component models to influence each other as training progresses. We applied this ensemble (Ens:LR+RF+SVM) to predict NAC response in breast cancer and ICI response in melanoma by integrating clinical, DNA sequencing, RNA sequencing and histopathology (only for breast cancer) data. For melanoma, we also trained three single ML models (LR, RF, and SVM) and another ensemble (Ens:LR+RF), and introduced a novel dual utility of SHAP for feature-selection during training and biomarker threshold identification during validation. Results: The ensemble model trained on the multi-omic breast cancer features achieved ROC-AUC of 0.88 and showed a potential 25% reduction in false positives (i.e., incorrect predictions of good response) compared to its predecessor from Sammut et al. [1]. In the melanoma cohort, the Ens:LR+RF model achieved ROC-AOC of 0.77 but was outperformed by the RF model, ROC-AUC 0.78. SHAP revealed unique interactions between each ML model and the feature space, resulting in distinct training feature sets per model. During validation, the intersection between feature values and SHAP scores revealed numerical thresholds underpinning good versus poor responses of clinically meaningful biomarkers such as neoantigen load (>2.25 good, <2.25 poor, values in log10 scale). Across these two studies, we developed and open-sourced a scalable and versatile ML workflow (xML-workFLow) for rapid experimentation in biomedical research. Conclusions: This work showcases the potential of multi-omics explainable ML in advancing precision oncology to improve treatment outcome prediction. With further experimental validation, the use of explainable ML to determine numerical thresholds could guide the development of companion diagnostics and inform combination therapeutic strategies. References: 1. Sammut, S.J., et al., Multi-omic machine learning predictor of breast cancer therapy response. Nature, 2022. 601(7894): p. 623-629. Citation Format: Khoa A. Tran, Venkateswar Addala, Lambros T. Koufariotis, Jia Zhang, Scott Wood, Conrad Leonard, Lotte L. Hoeijmakers, Christian U. Blank, Mireia Crispin-Ortuzar, Amy McCart. Reed, Po-ling Inglis, Sunil R. Lakhani, Elizabeth D. Williams, John V. Pearson, Olga Kondrashova, Nicola Waddell. Multi-omic explainable machine learning improves cancer treatment outcome prediction [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Artificial Intelligence and Machine Learning; 2025 Jul 10-12; Montreal, QC, Canada. Philadelphia (PA): AACR; Clin Cancer Res 2025;31(13_Suppl):Abstract nr A051

    The repertoire of mutational signatures in human cancer

    No full text
    Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3–15, enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer

    Integrative pathway enrichment analysis of multivariate omics data

    No full text
    Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations

    Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer

    No full text
    Chromatin is folded into successive layers to organize linear DNA. Genes within the same topologically associating domains (TADs) demonstrate similar expression and histone-modification profiles, and boundaries separating different domains have important roles in reinforcing the stability of these features. Indeed, domain disruptions in human cancers can lead to misregulation of gene expression. However, the frequency of domain disruptions in human cancers remains unclear. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we analyzed 288,457 somatic structural variations (SVs) to understand the distributions and effects of SVs across TADs. Notably, SVs can lead to the fusion of discrete TADs, and complex rearrangements markedly change chromatin folding maps in the cancer genomes. Notably, only 14% of the boundary deletions resulted in a change in expression in nearby genes of more than twofold

    Genomic footprints of activated telomere maintenance mechanisms in cancer

    No full text
    Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXXtrunc) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXXtrunc tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer

    A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

    No full text
    In cancer, the primary tumour’s organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA

    Divergent mutational processes distinguish hypoxic and normoxic tumours

    No full text
    Many primary tumours have low levels of molecular oxygen (hypoxia), and hypoxic tumours respond poorly to therapy. Pan-cancer molecular hallmarks of tumour hypoxia remain poorly understood, with limited comprehension of its associations with specific mutational processes, non-coding driver genes and evolutionary features. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we quantify hypoxia in 1188 tumours spanning 27 cancer types. Elevated hypoxia associates with increased mutational load across cancer types, irrespective of underlying mutational class. The proportion of mutations attributed to several mutational signatures of unknown aetiology directly associates with the level of hypoxia, suggesting underlying mutational processes for these signatures. At the gene level, driver mutations in TP53, MYC and PTEN are enriched in hypoxic tumours, and mutations in PTEN interact with hypoxia to direct tumour evolutionary trajectories. Overall, hypoxia plays a critical role in shaping the genomic and evolutionary landscapes of cancer

    Patterns of somatic structural variation in human cancer genomes

    No full text
    AbstractA key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes1–7. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types8. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions—as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2–7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and—in liver cancer—frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act

    Pathway and network analysis of more than 2500 whole cancer genomes

    No full text
    The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments
    corecore