6 research outputs found
Additional file 1: of BigQ: a NoSQL based framework to handle genomic variants in i2b2
This file contains supplementary tables, figures and BigCouch tuning parameters.(DOCX 43 kb
Classification performance for each data set.
<p>Classification performance for each data set.</p
Examples of variants influencing more than one gene.
<p>Protein encoded by gene <i>A</i> interacts with proteins encoded by gene <i>B</i> and gene <i>C</i>. (a) Variant on gene <i>A</i>, <i>V</i><sub><i>A</i></sub> contributes to the score both for <i>A</i> and <i>B</i> due to their interaction. In the same way, variants on gene <i>B</i>, <i>V</i><sub><i>B</i></sub>, and on gene <i>C</i>, (<i>V</i><sub><i>C</i></sub>) both contribute to the scores of genes <i>A</i> <i>B</i> and <i>C</i>. (b) Resulting variant contributions on final PPIs scores.</p
Interconnected, statistically significant ROIs.
<p>Graphical representation of statistically significant ROIs and their overlapping. The direction of the arrow means that an element is included into another. Gene ROIs (light blue) can be part of pathway (green) or PPI (grey) ROIs, while domain ROIs (purple) can be part of gene ROIs.</p
Data fusion framework.
<p>Sequencing data are collapsed to calculate their mutational loads using four ROIs, namely genes, pathways, domains and PPIs. This allows studying ROI-phenotype associations along the four correspondent axes. Each element tested for association then becomes a feature for a prediction model. Single ROI types are combined to create data sets. Each data set is split into a training and test set. The training set is used to tune the learning parameters of a RF model and then select the best set of features, while the test set is used to measure the prediction performances.</p