5 research outputs found

    Assigning protein function from domain-function associations using DomFun

    Get PDF
    Background: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. Results: We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of Fmax and Smin We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. Conclusions: DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project

    Expression of MALT1 oncogene in hematopoietic stem/progenitor cells recapitulates the pathogenesis of human lymphoma in mice

    Get PDF
    Chromosomal translocations involving the MALT1 gene are hallmarks of mucosa-associated lymphoid tissue (MALT) lymphoma. To date, targeting these translocations to mouse B cells has failed to reproduce human disease. Here, we induced MALT1 expression in mouse Sca1(+)Lin(-) hematopoietic stem/progenitor cells, which showed NF-κB activation and early lymphoid priming, being selectively skewed toward B-cell differentiation. These cells accumulated in extranodal tissues and gave rise to clonal tumors recapitulating the principal clinical, biological, and molecular genetic features of MALT lymphoma. Deletion of p53 gene accelerated tumor onset and induced transformation of MALT lymphoma to activated B-cell diffuse large-cell lymphoma (ABC-DLBCL). Treatment of MALT1-induced lymphomas with a specific inhibitor of MALT1 proteolytic activity decreased cell viability, indicating that endogenous Malt1 signaling was required for tumor cell survival. Our study shows that human-like lymphomas can be modeled in mice by targeting MALT1 expression to hematopoietic stem/progenitor cells, demonstrating the oncogenic role of MALT1 in lymphomagenesis. Furthermore, this work establishes a molecular link between MALT lymphoma and ABC-DLBCL, and provides mouse models to test MALT1 inhibitors. Finally, our results suggest that hematopoietic stem/progenitor cells may be involved in the pathogenesis of human mature B-cell lymphomas

    Datasets related to a study aimed to identify genetic markers of CDA by subphenotypes associated with cardiotoxicity

    No full text
    Who produced the data? The data has been created by the authors listed above. Is the title specific enough? "Datasets related to a study aimed to identify genetic markers of CDA by subphenotypes associated with cardiotoxicity." Why has the data been created? These datasets are supplementary material with which the principal and supplementary figures and tables of our indicated work were generated. What limitations do the data have (for example, sensitive data has been deleted)? All confidential patient information is not present. We have not had access to that information, following current legal regulations. How should the data be interpreted? These data sets should not be separated from the main article in which they were utilized. Thus, to better understand their context, researchers should see them in the global scenario of our work. Are there gaps in the data, or do they give a complete picture of the topic studied? As indicated above, data should be considered and interpreted in the global context of our study. What processes have generated the data? The processes that generated the data are indicated in the summary of the data above and individually for each of them. Thus, each dataset is accompanied by a legend within the document. What does the data measure in the columns of the files? As indicated, each dataset individually shows the information contained in the legend of each dataset. What software is required to be able to read the data? The datasets are in Excel format. How should the data be quoted? Researchers should cite the data in the context of the work they belong to once it is published and free of the embargo. Can the data be reused? What use licenses are assigned to you? In principle, yes. If additional clinical information is required, these data were previously published by some of us, and the references are included in our manuscript. These data are available from the principal investigators of the references listed in our work upon reasonable request. Are there more versions of the data? Where? I do not think so beyond our files and copies. Have the technical terms and acronyms referenced by the data been defined? A legend with the appropriate descriptions accompanies each dataset. Have the geographic and chronological parameters of the data been qualified? The authors of the work have generated the data. Elsewhere, we indicate the authors of the work, their contributions, and affiliations. Are keywords sufficiently data-specific? Are they based on any thesaurus? Keywords are based on our study. We include cardiotoxicity due to anthracyclines, missing heritability, subphenotype, pathophenotype, complex trait. What is the name of the research project in which the data are framed? The main research project in which the data is prepared is: Títle: "Chemotherapy cardiotoxicity in the elderly: a translational and personnel approach." Ref.: PIE14/00066 Who has financed data production and management? Each of the authors of the study has its funding. The grants are included in the acknowledgments section of our manuscript.Here we present a series of supplemental datasets that complement our study entitled "A Systems Genetics approach to identify genetic markers of cardiotoxicity due to anthracyclines in cancer patients." The datasets presented here were used to generate the main and supplementary figures and tables of the indicated study. The study consists of the identification of genetic markers of cardiotoxicity due to anthracyclines (CDA). CDA is a complex genesis disease or complex trait, and because of this, there is a component of missing heritability. Therefore, it is not possible to identify genetic markers associated with CDA risk. Here, we propose that molecular subphenotypes associated with the CDA may be a strategy for identifying some of this missing heritability and risk markers associated with it. A similar strategy could be applied to identify markers of other diseases of complex genesis. This study is done using a genetically heterogeneous cohort of mice that developed breast cancer and was treated with doxorubicin or a combined treatment of doxorubicin and docetaxel. The mouse cohort was generated by backcrossing, so each mouse is genetically unique. Post-chemotherapy heart damage was assessed by quantifying fibrosis's cardiac area and the thickness of myocardial fibers. The genetic regions associated with CDA were assessed by massive genotyping and genetic linkage analysis. Several molecular subphenotypes were quantified in the myocardium, and their association with the CDA was evaluated. Subsequently, we identified which of them were most statistically associated with CDA in multivariate models. Moreover, which complex trait loci (QTLs) associated with molecular subphenotypes best explained CDA. This strategy served to identify in the cohort of mice genes whose allelic forms could be candidates for the risk of CDA. Allelic variants of these genes were evaluated in four cohorts of cancer patients treated with anthracyclines and whose CDA was evaluated by echocardiography or cardiac magnetic resonance imaging (CMR).JPL laboratory was partially supported by the European Regional Development Fund (ERDF) and the Ministry of Science, Innovation, and Universities (SAF2014-56989-R, SAF2017-88854R), the Carlos III Health Institute (PIE14/00066), "Proyectos Integrados IBSAL 2015" (IBY15/00003), the Regional Government of Castile and Leon (CSI234P18), and "We can be heroes" Foundation. AGN laboratory and human patients' study are supported by funds from the ISCIII project grant (PI18/01242). The Human Genotyping unit is a member of CeGen, PRB3, and is supported by grant PT17/0019, of the PE I+D+i 2013-2016, funded by ISCIII and ERDF. SCLL was the recipient of a Ramón y Cajal research contract from the Spanish Ministry of Economy and Competitiveness, and the work was supported by MINECO/FEDER research grants (RTI2018-094130-B-100). The Proteomics Unit belongs to ProteoRed, PRB3-ISCIII, supported by grant PT17/0019/0023, of the PE I + D + I 2017-2020, funded by ISCIII and FEDER. RCC is funded by fellowships from the Spanish Regional Government of Castile and León. NGS is a recipient of an FPU fellowship (MINECO/FEDER). hiPSC-CM studies were funded in part by the "la Caixa" Banking Foundation under the project code HR18-00304" and Severo Ochoa CNIC Intramural Project (Expediente 12-2016 IGP) to JJ.Supplemental Dataset 1: CDA pathophenotypes after doxorubicin treatment. We treated 71 mice carrying breast cancer with doxorubicin. Each mouse was generated by backcrossing; thus, each one is genetically unique. Cardiotoxicity due to anthracyclines (CDA) was evaluated by automatically quantifying the heart fibrosis area and the average area of myocardial fibers as pathophenotypes of cardiotoxicity using the Ariol slide scanner. The histopathological damage was evaluated in the subendocardium and subepicardium from five randomly chosen regions of each sample (averages in μm2 are shown).-- Supplemental Dataset 2: CDA pathophenotypes after the combined therapy. We treated 61 mice carrying breast cancer with the combined therapy with doxorubicin and docetaxel. Each mouse was generated by backcrossing; thus, each one is genetically unique. Cardiotoxicity due to anthracyclines (CDA) was evaluated by automatically quantifying the heart fibrosis area and the average area of myocardial fibers as pathophenotypes of cardiotoxicity using the Ariol slide scanner. The histopathological damage was evaluated in the subendocardium and subepicardium from five randomly chosen regions of each sample (averages in μm2 are shown).-- Supplemental Dataset 3: CDA subphenotypes after doxorubicin therapy. Myocardium molecular subphenotypes after doxorubicin therapy. Proteins were quantified by a multiplex bead array (Luminex). TGFβ units are shown in pg/mL. The rest of the protein levels are shown in molecular fluorescence intensity (MFI) Units. The telomeric length was quantified by QPCR (RQ units). miRNAs were quantified by QPCR (RQ units). QPCR analyses were assessed by the ΔΔCT method; we show the averages of triplicates.-- Supplemental Dataset 4: CDA subphenotypes after the combined therapy. Myocardium molecular subphenotypes after the combined therapy with doxorubicin and docetaxel. Proteins were quantified by a multiplex bead array (Luminex). TGFβ units are shown in pg/mL. The rest of the protein levels are shown in molecular fluorescence intensity (MFI) Units. The telomeric length was quantified by QPCR (RQ units). miRNAs were quantified by QPCR (RQ units). QPCR analyses were assessed by the ΔΔCT method; we show the averages of triplicates.-- Supplemental Dataset 5: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after doxorubicin therapy in all mice.-- Supplemental Dataset 6: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after doxorubicin therapy in young mice. Correlation of Spearman.-- Supplemental Dataset 7: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after doxorubicin therapy in old mice. Correlation of Spearman.-- Supplemental Dataset 8: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after the combined therapy in all mice. Correlation of Spearman.-- Supplemental Dataset 9: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after the combined therapy in young mice. Correlation of Spearman.-- Supplemental Dataset 10: Correlations identified between molecular subphenotype levels in the myocardium and pathophenotypes of cardiotoxicity due to anthracyclines (CDA) after the combined therapy in old mice. Correlation of Spearman.-- Supplemental Dataset 11: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after doxorubicin therapy in all mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 12: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after doxorubicin therapy in young mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 13: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after doxorubicin therapy in old mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 14: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after the combined therapy in all mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 15: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after the combined therapy in young mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 16: Linkage analysis of molecular subphenotype levels quantified in the myocardium. Lod scores after the combined therapy in old mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 17: Massive genotyping of mouse cohort treated with doxorubicin. The genome-wide scan was carried out at the Spanish National Centre of Genotyping (CeGEN) at the Spanish National Cancer Research Centre (CNIO, Madrid, Spain). The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution.-- Supplemental Dataset 18: Massive genotyping of mouse cohort treated with the combined therapy. The genome-wide scan was carried out at the Spanish National Centre of Genotyping (CeGEN) at the Spanish National Cancer Research Centre (CNIO, Madrid, Spain). The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution.-- Supplemental Dataset 19: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after doxorubicin therapy in all mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 20: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after doxorubicin therapy in young mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 21: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after doxorubicin therapy in old mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 22: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after the combined therapy in all mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 23: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after the combined therapy in young mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 24: Linkage analysis of CDA pathophenotypes quantified in the myocardium. Lod scores after the combined therapy in old mice. The Illumina Mouse Medium Density Linkage Panel Assay was used to genotype 130 F1BX mice at 1449 single nucleotide polymorphisms (SNPs). Genotypes were classified as FVB/FVB (F/F) or FVB/C57BL/6 (F/B). Ultimately, 806 SNPs are informative from the FVB and C57BL/6 mice; the average genomic distance between these SNPs was 9.9 Mb. The genotype proportion among the F1BX mice showed a normal distribution. Linkage analysis was carried out using interval mapping with the expectation-maximization (EM) algorithm and R/QTL software. The criteria for significant and suggestive linkages for single markers were chosen based on Lander and Kruglyak (see methods section of our manuscript).-- Supplemental Dataset 25: Human breast cancer cohort-1 genotyping. The association of genetic variants with CDA was evaluated in four patient cohorts p
    corecore