38 research outputs found

    Community Development of Validated Variant Calling Pipelines

    No full text
    <p>Presentation at Genome Informatics 2013:</p> <p> </p> <p>Translational research relies on accurate identification of genomic variants. However, rapidly changing best practice approaches in alignment and variant calling, coupled with large data sizes, make it a challenge to create reliable and reproducible variant calls. Coordinated community development can help overcome these challenges by sharing testing and updates across multiple groups. We describe bcbio-nextgen, a distributed multi-architecture pipeline that automates variant calling, validation and organization of results for query and visualization. It creates an easily installable, reliable infrastructure from best-practice open source tools with the following goals:</p> <p>Quantifiable: Validates variant calls against known reference materials developed by the Genome in a Bottle consortium. The bcbio.variationtoolkit automates scoring and assessment of calls to identify regressions in variant identification as calling pipelines evolve. Incorporation of multiple variant calling approaches from Broad’s GATK best practices and the Marth lab’s gkno software enables informed comparisons between current and future algorithms.Scalable: bcbio-nextgen handles large population studies with hundreds of whole genome samples by parallelizing on a wide variety of schedulers and multicore machines, setting up different ad hoc cluster configurations for each workflow step. Work in progress includes integration with virtual environments, including Amazon Web Services and OpenStack.Accessible: Results automatically feed into tools for query and investigation of variants. The GEMINI framework provides a queryable database associating variants with a wide variety of genome annotations. The o8 web-based tool visualizes the work of variant prioritization and assessment.Community developed: bcbio-nextgen is widely used in multiple sequencing centers and research laboratories. We actively encourage contributors to the code base and make it easy to get started with a fully automated installer and updater that prepares all third party software and reference genomes.</p> <p> </p

    The method of integrated literature- and data mining to identify an initial list of putative candidate genes

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Computational selection and prioritization of candidate genes for Fetal Alcohol Syndrome"</p><p>http://www.biomedcentral.com/1471-2164/8/389</p><p>BMC Genomics 2007;8():389-389.</p><p>Published online 25 Oct 2007</p><p>PMCID:PMC2194724.</p><p></p

    Dot plot of significant TReQTLs.

    No full text
    <p>A) CEU B) YRI. Each circle represents a TReQTL SNP with a <i>p</i>-value<1×10<sup>−6</sup>. The x-axis is the relative position of the TReQTL SNPs across the genome in Mb. The chromosomes are illustrated by alternating shaded and unshaded sections of the plot. The order of the chromosomes is from #1 to #22 from left to right. The y-axis is the relative position of the TR across the genome in Mb. The order of the chromosomes is from #1 to #22 from bottom to top. The points were jittered to enhance the display of TReQTLs in close proximity. TReQTLs near the diagonal line have the potential to be <i>cis</i>-regulated.</p

    Distribution of the number of genes as downstream targets (DSTs) of transcript-regulators.

    No full text
    <p>The x-axis is the # of genes as DSTs and the y-axis is the count. The table inset is a summary of the frequency distribution for the count of the DSTs (two or more) per TR.</p

    Over-representation of TReQTL SNPs in genomic regions.

    No full text
    <p>10 K permutations of 472 SNPs with a <i>p</i>-value<1×10<sup>−4</sup> in either CEU or YRI.</p

    Co-regulation of DSTs of TRs where disease-causing SNPs are located in the TR binding site of at least one of the TR DSTs.

    No full text
    <p>GCS – Group correlation score. The disease-causing SNPs were obtained from the NHGRI GWAS Catalog (Available at: <a href="http://www.genome.gov/gwastudies" target="_blank">www.genome.gov/gwastudies</a>. Accessed 3/3/2010) with selected SNP-trait associations limited to those with <i>p</i>-values<1×10<sup>−5</sup>.</p

    Foxp3 TReQTL network.

    No full text
    <p>The interaction network was generated by Ingenuity Pathway Analysis (IPA) software. Based on the IPA curated knowledgebase dashed lines represent indirect interactions and solid lines denote direct interactions. The arrow represents the process of acting on a target. Vertical rectangles are G-protein couple receptors, ovals are transcription regulators, squares are cytokines, double circles are complexs/groups and single circles are other types of biological molecules. Shaded nodes represent genes of molecules from the TReQTL for Foxp3 (those that the SNPs map to, the DSTs and the TR).</p

    Scatter plot of differential expression of the DSTs of Foxp3.

    No full text
    <p>The x-axis is the genotype for SNP rs3790904 - Latrophilin homolog 1 (<i>Lphh1</i>/<i>Lphn2</i>). The SNP genotype is also coded as number of minor alleles. The y-axis is the log<sub> 2</sub> gene expression. The green dots are the expression from colony stimulating factor 2 (<i>Csf2</i>) and the blue dots are the expression from interleukin 2 (Il2). The Pearson correlation of the expression from <i>Csf2</i> and <i>Il2</i> is +0.56.</p

    Strategy to identify transcript-regulator eQTLs (TReQTLs).

    No full text
    <p>The gene expression of downstream targets (DSTs) of a transcript-regulator (TR) is used as quantitative traits to associate with individual single nucleotide polymorphisms (SNPs). In some cases the SNPs map to the same gene, different genes, the TR or are intergenic.</p
    corecore