24 research outputs found

    Savant Genome Browser 2: visualization and analysis for population-scale genomics

    Get PDF
    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.co

    Integrated genomic characterization of pancreatic ductal adenocarcinoma

    Get PDF
    We performed integrated genomic, transcriptomic, and proteomic profiling of 150 pancreatic ductal adenocarcinoma (PDAC) specimens, including samples with characteristic low neoplastic cellularity. Deep whole-exome sequencing revealed recurrent somatic mutations in KRAS, TP53, CDKN2A, SMAD4, RNF43, ARID1A, TGFβR2, GNAS, RREB1, and PBRM1. KRAS wild-type tumors harbored alterations in other oncogenic drivers, including GNAS, BRAF, CTNNB1, and additional RAS pathway genes. A subset of tumors harbored multiple KRAS mutations, with some showing evidence of biallelic mutations. Protein profiling identified a favorable prognosis subset with low epithelial-mesenchymal transition and high MTOR pathway scores. Associations of non-coding RNAs with tumor-specific mRNA subtypes were also identified. Our integrated multi-platform analysis reveals a complex molecular landscape of PDAC and provides a roadmap for precision medicine

    Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis

    Get PDF
    Correction: vol 7, 13205, 2016, doi:10.1038/ncomms13205Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in Bone-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2) = 0.18, P value = 0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.Peer reviewe

    Incorporating networks in a probabilistic graphical model to find drivers for complex human diseases

    No full text
    <div><p>Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios.</p></div

    Finding associations in a heterogeneous setting: statistical test for aberration enrichment

    No full text
    Abstract Most two-group statistical tests find broad patterns such as overall shifts in mean, median, or variance. These tests may not have enough power to detect effects in a small subset of samples, e.g., a drug that works well only on a few patients. We developed a novel statistical test targeting such effects relevant for clinical trials, biomarker discovery, feature selection, etc. We focused on finding meaningful associations in complex genetic diseases in gene expression, miRNA expression, and DNA methylation. Our test outperforms traditional statistical tests in simulated and experimental data and detects potentially disease-relevant genes with heterogeneous effects

    Results of Hotnet2 and Conflux on two Star-shaped disease subnetwork where the center is not a causal gene.

    No full text
    <p>A) <i>KRAS</i> centered star B) <i>GATA3</i> centered star. The nodes in purple are genes found by both Hotnet2 and Conflux. The nodes in cyan were only found by Hotnet2. The nodes in red and pink are respectively nodes detected (marginal ≥ 0.2) or having suggestive evidence (marginal ≥ 0.05) by Conflux. Nodes colored in plum were found by Hotnet2 but only have suggestive evidence in Conflux. Yellow nodes are true causal genes that were neither found nor suggested by any method. The diamond shaped nodes are the true causal genes. The sample size used is <i>n</i> = 800.</p

    Conflux’s hierarchical graphical model.

    No full text
    <p>Graphical model representing the relation between phenotypes, coding variants and gene latent variables with the PPI network used as prior. All the variables, factors and inputs inside the plate are per individual. The variables, factors and inputs outside the plate, such as protein-protein interactions are not individual-specific. This model simultaneously uses all genes genome-wide and is shown here for 3 genes for clarity. The graph on the right is a zoom in on the gene specific portion of the graphical model.</p

    Results of Hotnet2 and Conflux on two radomly generated chain-shaped disease subnetworks.

    No full text
    <p>(A) Chain of 20 causal genes. (B) Chain of 10 causal genes. The nodes in purple are genes found by both Hotnet2 and Conflux. The nodes in cyan were only found by Hotnet2. The nodes in red and pink are respectively nodes detected (marginal ≥ 0.2) or having suggestive evidence (marginal ≥ 0.05) by Conflux. Nodes colored in plum were found by Hotnet2 but only have suggestive evidence in Conflux. The diamond shaped nodes are the true causal genes. The sample size used is <i>n</i> = 800.</p

    Description of the variables in the model.

    No full text
    <p><i>pheno</i> and <i>D</i> are fixed inputs. All other variables are latent random variables to be inferred.</p
    corecore