6 research outputs found

    Integrative analysis and visualization of multi-omics data of mitochondria-associated diseases

    Get PDF

    Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

    Get PDF
    Background: Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits. Results: We have developed the Cancer Systems Biology Database (CancerSysDB), a resource for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. The CancerSysDB can be adopted by any center for the organization of their locally acquired data and its integration with publicly available data from multiple studies. A publicly available main instance of the CancerSysDB can be used to obtain highly flexible queries across multiple data types as shown by highly relevant use cases. In addition, we demonstrate how the CancerSysDB can be used for predictive cancer classification based on whole-exome data from 9091 patients in The Cancer Genome Atlas (TCGA) research network. Conclusions: Our database bears the potential to be used for large-scale integrative queries and predictive analytics of clinically relevant traits

    Additional file 2: of Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

    No full text
    Figure S1 Overall success rate of the prediction of tumor types by random forests depending on (a) the number of samples per stratum in the random forest, (b) the number of variables picked randomly for each tree in the forest and (c) the number of trees learned in the forest. Importantly, the accuracy is increasing monotonically with the number of samples, indicating that the overall strategy is suitable, in particular, for a database with continuously growing amounts of data. In contrast, the success rate does not so much depend on the parameters chosen for the training phase of the random forest. (PNG 34 kb

    Additional file 3: of Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

    No full text
    Figure S2 Interactive workflow of mitochondrial pathways. Shown is the Tricarboxylic acid cycle (TCA) pathway for KIRP cancer patients. The central view of this workflow is a bee-swarm scatterplot, which contains the averaged log2-fold changes of patient groups according to either tumor stage, gender or vital status. Each dot is represents the averaged log2-fold change of one gene that has been assigned to the chosen function. Functions can be selected on the right-hand side of the scatter plot. The dashboard below the scatter plot can be used to change the averaging according to a different feature ((a), which shows averaging according to stage), to display information on the composition of the selected feature ((b), which informs the user that all individuals of stage II, which was hovered over in this case, are male and that one individual is dead, while three of the patients are alive); or to further select individual patients and thus modify the averaging shown in the scatter plot ((c), where only female patients were chosen for stage-dependent averaging; as female patient data are only available for two stages (I and III), the scatter plot is changed accordingly). (PNG 679 kb

    Additional file 1: of Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

    No full text
    The source code of the database queries and workflow scripts for the three use cases reported in the paper. The results can be reproduced using the query results and analysis scripts provided. File query1.csv contains the barcodes of all samples for which mutation data do exist. File query2.csv contains the barcodes of all samples which carry a mutation in the gene of interest. Finally, query3.csv contains the survival data (according to Fig. 1a), a list of all mutations of patients in the cohort of interest (according to Fig. 1b), or a list of all genomic segments with aberrant copy number in the cohort of interest (according to Fig. 1c). There are small discrepancies between the number of patients with mutation data and the number of patients with survival data (Fig. 1a) and copy number data (Fig. 1c). (ZIP 4981 kb
    corecore