132 research outputs found

    Bringing Hadoop into Bioinformatics with Cloudgene and CloudMan

    Get PDF
    Despite the evident potential of the MapReduce model and existence of bioinformatic algorithms and applications, those are still to become widely adopted in the bioinformatics data analysis. The Hadoop MapReduce model offers a simple framework for data parallelism by providing automated runtime recovery (for both task runtime and hardware failures), implicit scalability (tasks automatically run in parallel batch mode), as well as data replication and locality (reduce data movement, hence increase processing capacity). We identify two prerequisites for wider adoption and higher utilization of MapReduce tools: (1) abstract the technical details of how multiple existing MapReduce tools are composed, and (2) provide easy access to the necessary compute infrastructure and the appropriate environment. Satisfying these requirements would allow bioinformatics domain experts to focus on the analysis while the required technical details are hidden. At BOSC 2012, two platforms were presented: Cloudgene a MapReduce tool execution platform leveraging Hadoop, and CloudMan a cloud resource manager. Since then, we have combined and extended these two platforms to provide a readily available and an accessible Hadoopbased bioinformatics environment for the Cloud. Cloudgene, other than allowing arbitrary MapReduce tools to be integrated and used to craft an analysis, has been extended as a job execution engine for currently two dedicated services: an imputation service developed in cooperation with the Center for Statistical Genetics, University of Michigan (available at imputationserver.sph.umich.edu ) and a mtDNA analysis service (available at mtdnaserver.uibk.ac.at ). Thus far, the “Michigan Imputation Server” has shown remarkable popularity and scalability with over 690,000 human genomes being imputed within one year. These services have been deployed on dedicated hardware and offer a simple interface for the specific tasks while the jobs are being executed in the MapReduce fashion. This demonstrates a positive disposition towards wider adoption of MapReduce paradigm in the bioinformatics data analysis space given accessible and effective solutions. To facilitate easy access to such MapReduce solutions for bioinformatics and broaden the availability of these services, we have extended CloudMan to provide a Hadoopbased environment with preconfigured Cloudgene. CloudMan handles the tasks of procuring required cloud resources and configuring the appropriate environment, thus insulating the user from the lowlevel technical details otherwise required. Because CloudMan is compatible with multiple cloud technologies, it is now feasible to deploy this environment on a range of private and public clouds. This makes it possible for anyone to obtain a scalable Hadoopbased cluster with Cloudgene preinstalled and readily execute MapReduce tools. This talk will present the motivation for supporting greater adoption of MapReducebased applications in the bioinformatics data analysis space followed by the details of the described services and their functionality

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we resent Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    - The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow. Document type: Conference objec

    CONAN: copy number variation analysis software for genome-wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs.</p> <p>Results</p> <p>CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data.</p> <p>Conclusions</p> <p>CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at <url>http://genepi-conan.i-med.ac.at</url>.</p

    Persistence of immunity to SARS-CoV-2 over time in the ski resort Ischgl

    Full text link
    Background In early March 2020, a SARS-CoV-2 outbreak in the ski resort Ischgl in Austria triggered the spread of SARS-CoV-2 throughout Austria and Northern Europe. In a previous study, we found that the seroprevalence in the adult population of Ischgl had reached 45% by the end of April, representing an exceptionally high level of local seropositivity in Europe. We performed a follow-up study in Ischgl, which is the first to show persistence of immunity and protection against SARS-CoV-2 and some of its variants at a community level. Methods Of the 1259 adults that participated in the baseline study, 801 have been included in the follow-up in November 2020. The study involved the analysis of binding and neutralizing antibodies and T cell responses. In addition, the incidence of SARS-CoV-2 and its variants in Ischgl was compared to the incidence in similar municipalities in Tyrol until April 2021. Findings For the 801 individuals that participated in both studies, the seroprevalence declined from 51.4% (95% confidence interval (CI) 47.9-54.9) to 45.4% (95% CI 42.0-49.0). Median antibody concentrations dropped considerably (5.345, 95% CI 4.833 - 6.123 to 2.298, 95% CI 2.141 - 2.527) but antibody avidity increased (17.02, 95% CI 16.49 - 17.94 to 42.46, 95% CI 41.06 - 46.26). Only one person had lost detectable antibodies and T cell responses. In parallel to this persistent immunity, we observed that Ischgl was relatively spared, compared to similar municipalities, from the prominent second COVID-19 wave that hit Austria in November 2020. In addition, we used sequencing data to show that the local immunity acquired from wild-type infections also helped to curb infections from variants of SARS-CoV-2 which spread in Austria since January 2021. Interpretation The relatively high level of seroprevalence (40-45%) in Ischgl persisted and might have been associated with the observed protection of Ischgl residents against virus infection during the second COVID-19 wave as well as against variant spread in 2021. Funding Funding was provided by the government of Tyrol and the FWF Austrian Science Fund

    A novel but frequent variant in LPA KIV-2 is associated with a pronounced Lp(a) and cardiovascular risk reduction

    Get PDF
    Aims Lp(a) concentrations represent a major cardiovascular risk factor and are almost entirely controlled by one single locus (LPA). However, many genetic factors in LPA governing the enormous variance of Lp(a) levels are still unknown. Since up to 70% of the LPA coding sequence are located in a difficult to access hypervariable copy number variation named KIV-2, we hypothesized that it may contain novel functional variants with pronounced effects on Lp(a) concentrations. We performed a large scale mutation analysis in the KIV-2 using an extreme phenotype approach Methods and results We compiled an discovery set of 123 samples showing discordance between LPA isoform phenotype and Lp(a) concentrations and controls. Using ultra-deep sequencing, we identified a splice site variant (G4925A) in preferential association with the smaller LPA isoforms. Follow-up in a European general population (n = 2892) revealed an exceptionally high carrier frequency of 22.1% in the general population. The variant explains 20.6% of the Lp(a) variance in carriers of low molecular weight (LMW) apo(a) isoforms (P = 5.75e-38) and reduces Lp(a) concentrations by 31.3 mg/dL. Accordingly the odds ratio for cardiovascular disease was reduced from 1.39 [95% confidence interval (CI): 1.17-1.66, P = 1.89e-04] for wildtype LMW individuals to 1.19 [95% CI: 0.92;1.56, P = 0.19] in LMW individuals who were additionally positive for G4925A. Functional studies point towards a reduction of splicing efficiency by this novel variant. Conclusion A highly frequent but until now undetected variant in the LPA KIV-2 region is strongly associated with reduced Lp(a) concentrations and reduced cardiovascular risk in LMW individuals

    Implicating genes, pleiotropy, and sexual dimorphism at blood lipid loci through multi-ancestry meta-analysis

    Get PDF
    Publisher Copyright: © 2022, The Author(s).Background: Genetic variants within nearly 1000 loci are known to contribute to modulation of blood lipid levels. However, the biological pathways underlying these associations are frequently unknown, limiting understanding of these findings and hindering downstream translational efforts such as drug target discovery. Results: To expand our understanding of the underlying biological pathways and mechanisms controlling blood lipid levels, we leverage a large multi-ancestry meta-analysis (N = 1,654,960) of blood lipids to prioritize putative causal genes for 2286 lipid associations using six gene prediction approaches. Using phenome-wide association (PheWAS) scans, we identify relationships of genetically predicted lipid levels to other diseases and conditions. We confirm known pleiotropic associations with cardiovascular phenotypes and determine novel associations, notably with cholelithiasis risk. We perform sex-stratified GWAS meta-analysis of lipid levels and show that 3–5% of autosomal lipid-associated loci demonstrate sex-biased effects. Finally, we report 21 novel lipid loci identified on the X chromosome. Many of the sex-biased autosomal and X chromosome lipid loci show pleiotropic associations with sex hormones, emphasizing the role of hormone regulation in lipid metabolism. Conclusions: Taken together, our findings provide insights into the biological mechanisms through which associated variants lead to altered lipid levels and potentially cardiovascular disease risk.Peer reviewe
    corecore