8 research outputs found

    Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

    Get PDF
    Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network

    Forest plot of <i>M</i> statistics summarizing systematic patterns of heterogeneity among studies in the CARDIOGRAMplusC4D GWAS meta-analysis.

    No full text
    <p>Sorted <i>M</i> statistics are presented for individual studies represented by filled squares with their 95% confidence intervals shown by horizontal lines; the sizes of the squares are proportional to each studies’ inverse-variance weighting. Studies showing weaker (<i>M</i> < 0) than average genetic effects can be distinguished from those showing stronger (<i>M</i> > 0) than average effects.</p

    Heterogeneity in the CARDIoGRAMplusC4D meta-analysis can be explained by differences in age of CAD onset, family history and ancestry.

    No full text
    <p><i>M</i> statistics for each study in the CARDIoGRAMplusC4D meta-analysis (Y- axis) are plotted against the average variant effect size (expressed as odds ratios) (X-axis) in each study. Panel A shows the ancestry of each study, panel B distinguishes early-onset from late-onset studies and panel C identifies studies ascertained with a positive family history of coronary artery disease. Panel D is a composite plot showing the degree of genetic enrichment among the studies in the meta-analysis, which ranged from non-enriched (late-onset studies without a positive family history of coronary artery disease) to doubly enriched (early-onset studies with a positive family history of coronary artery disease). The dashed lines indicate the Bonferroni corrected 5% significance threshold (<i>M</i> = ±0.483) to allow for multiple testing of 48 studies.</p

    Empirical type- 1 error rates and power to detect an outlier study for <i>M</i> at threshold α = 0.05.

    No full text
    <p>Empirical type- 1 error rates and power to detect an outlier study for <i>M</i> at threshold α = 0.05.</p

    A comparative power analysis of <i>M</i> and Cochran’s Q to detect systematic heterogeneity.

    No full text
    <p>The nine panels show (from left to right) simulations for 10, 15 and 30 studies, examined at 50, 25 and l0 variants; Data points for the <i>M</i> statistic are represented by filled circles whilst those for Cochran’s Q are denoted by filled triangles. Each data point represents a meta-analysis scenario where effect sizes for the non-outlier studies were held constant (log<sub>e</sub>(odds ratio) = 0.182 i.e. odds ratio = 1.2) to model homogeneous effects. The effect sizes of variants in the outlier study were the product of the non-outlier effect size (i.e. log<sub>e</sub>(odds ratio) = 0.182) and a parameter (fold-change) to model a continuous series of systematic heterogeneity patterns. All studies were equally weighted (standard error of log<sub>e</sub>(odds ratio) = 0.1).</p

    The power of the <i>M</i> statistic to detect systematic outlier studies.

    No full text
    <p>A power analysis of the <i>M</i> statistics for meta-analysis scenarios with varying numbers of studies and variants. The three panels show (from left to right) simulations for 10, 15 and 30 studies; 50, 25 and 10 variant simulations are shown by filled diamonds, filled circles, or open squares respectively. Each data point represents a meta-analysis simulation with 1,000 replicates, where an outlier study was assigned genetic effects that are x-fold stronger than the effects assigned to the remaining studies showing typical effects. Effect sizes for variants in the studies showing typical effects were allocated from an L—shaped distribution (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006755#pgen.1006755.s011" target="_blank">S2 Table</a>) whilst effect sizes for variants in the outlier study were calculated as a multiple of the typical effect size. For example, effect sizes for variants in an outlier study 2-fold-stronger than studies showing typical effects would be computed as (2 x ({0.04, 0.12, 0.2, 0.28, 0.4}, σ = 0.10).</p

    Assessing computational genomics skills: Our experience in the H3ABioNet African bioinformatics network

    Get PDF
    The H3ABioNet pan-African bioinformatics network, which is funded to support the Human Heredity and Health in Africa (H3Africa) program, has developed node-assessment exer�cises to gauge the ability of its participating research and service groups to analyze typical genome-wide datasets being generated by H3Africa research groups. We describe a frame�work for the assessment of computational genomics analysis skills, which includes standard operating procedures, training and test datasets, and a process for administering the exer�cise. We present the experiences of 3 research groups that have taken the exercise and the impact on their ability to manage complex projects. Finally, we discuss the reasons why many H3ABioNet nodes have declined so far to participate and potential strategies to encourage them to do so
    corecore