13 research outputs found

    MetaFetcheR: An R Package for Complete Mapping of Small-Compound Data

    No full text
    Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets

    MetaFetcheR : An R Package for Complete Mapping of Small-Compound Data

    No full text
    Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.Title in thesis list of papers:  MetaFetcheR: An R package for complete mapping of small compound data</p

    Supplementary tables:MetaFetcheR: An R package for complete mapping of small compound data

    No full text
    Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. Lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets

    What Is Abnormal in Normal Karyotype Acute Myeloid Leukemia in Children? : Analysis of the Mutational Landscape and Prognosis of the TARGET-AML Cohort

    No full text
    Normal karyotype acute myeloid leukemia (NK-AML) constitutes 20-25% of pediatric AML and detailed molecular analysis is essential to unravel the genetic background of this group. Using publicly available sequencing data from the TARGET-AML initiative, we investigated the mutational landscape of NK-AML in comparison with abnormal karyotype AML (AK-AML). In 164 (97.6%) of 168 independent NK-AML samples, at least one somatic protein-coding mutation was identified using whole-genome or targeted capture sequencing. We identified a unique mutational landscape of NK-AML characterized by a higher prevalence of mutated CEBPA, FLT3, GATA2, NPM1, PTPN11, TET2, and WT1 and a lower prevalence of mutated KIT, KRAS, and NRAS compared with AK-AML. Mutated CEBPA often co-occurred with mutated GATA2, whereas mutated FLT3 co-occurred with mutated WT1 and NPM1. In multivariate regression analysis, we identified younger age, WBC count &gt;= 50 x 10(9)/L, FLT3-internal tandem duplications, and mutated WT1 as independent predictors of adverse prognosis and mutated NPM1 and GATA2 as independent predictors of favorable prognosis in NK-AML. In conclusion, NK-AML in children is characterized by a unique mutational landscape which impacts the disease outcome

    Supplementary material: Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data

    No full text
    Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes i) induced by interferons (IFI35 and OTOF), ii) key to SLE cell types (KLRB1 encoding CD161), or iii) with roles in autophagy and NF-ÎșB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification.

    Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data

    No full text
    Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes (i) induced by interferons (IFI35 and OTOF), (ii) key to SLE cell types (KLRB1 encoding CD161), or (iii) with roles in autophagy and NF-ÎșB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification.ISSN:2045-232

    Machine learning-based analysis of glioma grades reveals co-enrichment

    Get PDF
    Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment

    Transcriptomic analysis reveals proinflammatory signatures associated with acute myeloid leukemia progression

    No full text
    Numerous studies have been performed over the last decade to exploit the complexity of genomic and transcriptomic lesions driving the initiation of acute myeloid leukemia (AML). These studies have helped improve risk classification and treatment options. Detailed molecular characterization of longitudinal AML samples is sparse, however; meanwhile, relapse and therapy resistance represent the main challenges in AML care. To this end, we performed transcriptome-wide RNA sequencing of longitudinal diagnosis, relapse, and/or primary resistant samples from 47 adult and 23 pediatric AML patients with known mutational background. Gene expression analysis revealed the association of short event-free survival with overexpression of GLI2 and IL1R1, as well as downregulation of ST18. Moreover, CR1 downregulation and DPEP1 upregulation were associated with AML relapse both in adults and children. Finally, machine learning–based and network-based analysis identified overexpressed CD6 and downregulated INSR as highly copredictive genes depicting important relapse-associated characteristics among adult patients with AML. Our findings highlight the importance of a tumor-promoting inflammatory environment in leukemia progression, as indicated by several of the herein identified differentially expressed genes. Together, this knowledge provides the foundation for novel personalized drug targets and has the potential to maximize the benefit of current treatments to improve cure rates in AML
    corecore