15 research outputs found
New Recurrent Structural Aberrations in the Genome of Chronic Lymphocytic Leukemia Based on Exome-Sequencing Data
Chronic lymphocytic leukemia (CLL) is the most frequent lymphoproliferative syndrome in Western countries, and it is characterized by recurrent large genomic rearrangements. During the last decades, array techniques have expanded our knowledge about CLL's karyotypic aberrations. The advent of large sequencing databases expanded our knowledge cancer genomics to an unprecedented resolution and enabled the detection of small-scale structural aberrations in the cancer genome. In this study, we have performed exome-sequencing-based copy number aberration (CNA) and loss of heterozygosity (LOH) analysis in order to detect new recurrent structural aberrations. We describe 54 recurrent focal CNAs enriched in cancer-related pathways, and their association with gene expression and clinical evolution. Furthermore, we discovered recurrent large copy number neutral LOH events affecting key driver genes, and we recapitulate most of the large CNAs that characterize the CLL genome. These results provide "proof-of-concept" evidence supporting the existence of new genes involved in the pathogenesis of CLL
Novel Mutation Hotspots within Non-Coding Regulatory Regions of the Chronic Lymphocytic Leukemia Genome
Mutations in non-coding DNA regions are increasingly recognized as cancer drivers. These mutations can modify gene expression in cis or by inducing high-order chormatin structure modifications with long-range effects. Previous analysis reported the detection of recurrent and functional non-coding DNA mutations in the chronic lymphocytic leukemia (CLL) genome, such as those in the 3' untranslated region of NOTCH1 and in the PAX5 super-enhancer. In this report, we used whole genome sequencing data produced by the International Cancer Genome Consortium in order to analyze regions with previously reported regulatory activity. This approach enabled the identification of numerous recurrently mutated regions that were frequently positioned in the proximity of genes involved in immune and oncogenic pathways. By correlating these mutations with expression of their nearest genes, we detected significant transcriptional changes in genes such as PHF2 and S1PR2. More research is needed to clarify the function of these mutations in CLL, particularly those found in intergenic regions
Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling
BACKGROUND: Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data. METHODS: Gene expression profiling (GEP) of in 2 different publicly available retrospective DLBCL cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel's concordance index (c-index) was used to assess model's predictability. Results were validated in an independent test set. RESULTS: Two hundred thirty-three and sixty-four patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. This pattern included the following genes: TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and the expression levels of 50 individual genes (training set c-index, 0.8404, test set c-index, 0.7942). CONCLUSION: Our results indicate that DLBCL survival models based on the application of machine learning algorithms to gene expression and clinical data can largely outperform other important prognostic variables such as disease stage and COO. Head-to-head comparisons with other risk stratification models are needed to compare its usefulness
Machine Learning Improves Risk Stratification in Myelodysplastic Neoplasms : An Analysis of the Spanish Group of Myelodysplastic Syndromes
Myelodysplastic neoplasms (MDS) are a heterogeneous group of hematological stem cell disorders characterized by dysplasia, cytopenias, and increased risk of acute leukemia. As prognosis differs widely between patients, and treatment options vary from observation to allogeneic stem cell transplantation, accurate and precise disease risk prognostication is critical for decision making. With this aim, we retrieved registry data from MDS patients from 90 Spanish institutions. A total of 7202 patients were included, which were divided into a training (80%) and a test (20%) set. A machine learning technique (random survival forests) was used to model overall survival (OS) and leukemia-free survival (LFS). The optimal model was based on 8 variables (age, gender, hemoglobin, leukocyte count, platelet count, neutrophil percentage, bone marrow blast, and cytogenetic risk group). This model achieved high accuracy in predicting OS (c-indexes; 0.759 and 0.776) and LFS (c-indexes; 0.812 and 0.845). Importantly, the model was superior to the revised International Prognostic Scoring System (IPSS-R) and the age-adjusted IPSS-R. This difference persisted in different age ranges and in all evaluated disease subgroups. Finally, we validated our results in an external cohort, confirming the superiority of the Artificial Intelligence Prognostic Scoring System for MDS (AIPSS-MDS) over the IPSS-R, and achieving a similar performance as the molecular IPSS. In conclusion, the AIPSS-MDS score is a new prognostic model based exclusively on traditional clinical, hematological, and cytogenetic variables. AIPSS-MDS has a high prognostic accuracy in predicting survival in MDS patients, outperforming other well-established risk-scoring systems