27 research outputs found
Performance of genotype data with varying sample sizes.
<p>Percentage of correct predictions with standard error for the genotype data using single-task, multi-task, and concatenated models with varying sample sizes.</p
Correct predictions using matched data models for breast cancer data.
<p>Percentage of correct predictions for the breast cancer data using summed prediction, summed enrichment score, and merged models.</p
Overview of the single-task pipeline.
<p>For genotype data, we associate each gene with a single SNP (a). Next, we calculate correlation statistics using the gene-based data for each data type (b). We then calculate enrichment scores using the correlation statistics for each data type (c). Finally, we independently build a predictive model for each data type using the enrichment scores for each data type and a standard SVM (d). In this overview, ASSESS corresponds to steps b and c.</p
Performance of matched data models with varying levels of similarity.
<p>Percentage of correct predictions with standard error using summed prediction, summed enrichment score, and merged models with varying levels of similarity in the data.</p
A Predictive Framework for Integrating Disparate Genomic Data Types Using Sample-Specific Gene Set Enrichment Analysis and Multi-Task Learning
<div><p>Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.</p> </div
Performance of expression data with varying sample sizes.
<p>Percentage of correct predictions with standard error for the expression data using single-task, multi-task, and concatenated models with varying sample sizes.</p
Average rank of target gene set.
<p>Average rank with standard error in single-task, multi-task, concatenated, summed enrichment score, and merged models for a gene set containing genes that are both differentially expressed and genetically associated with phenotype.</p
Top gene sets in breast cancer analysis.
<p>Gene sets with the largest multi-task common weights in the breast cancer analysis, along with the ranks of the expression and genotype single-task weights.</p
Correct predictions for breast cancer expression data.
<p>Percentage of correct predictions for the breast cancer expression data using single-task, multi-task, and concatenated models.</p
Performance of genotype data with varying levels of similarity.
<p>Percentage of correct predictions with standard error for the genotype data using single-task, multi-task, and concatenated models with varying levels of similarity in the data.</p