80 research outputs found

    Bringing Hadoop into Bioinformatics with Cloudgene and CloudMan

    Get PDF
    Despite the evident potential of the MapReduce model and existence of bioinformatic algorithms and applications, those are still to become widely adopted in the bioinformatics data analysis. The Hadoop MapReduce model offers a simple framework for data parallelism by providing automated runtime recovery (for both task runtime and hardware failures), implicit scalability (tasks automatically run in parallel batch mode), as well as data replication and locality (reduce data movement, hence increase processing capacity). We identify two prerequisites for wider adoption and higher utilization of MapReduce tools: (1) abstract the technical details of how multiple existing MapReduce tools are composed, and (2) provide easy access to the necessary compute infrastructure and the appropriate environment. Satisfying these requirements would allow bioinformatics domain experts to focus on the analysis while the required technical details are hidden. At BOSC 2012, two platforms were presented: Cloudgene a MapReduce tool execution platform leveraging Hadoop, and CloudMan a cloud resource manager. Since then, we have combined and extended these two platforms to provide a readily available and an accessible Hadoopbased bioinformatics environment for the Cloud. Cloudgene, other than allowing arbitrary MapReduce tools to be integrated and used to craft an analysis, has been extended as a job execution engine for currently two dedicated services: an imputation service developed in cooperation with the Center for Statistical Genetics, University of Michigan (available at imputationserver.sph.umich.edu ) and a mtDNA analysis service (available at mtdnaserver.uibk.ac.at ). Thus far, the “Michigan Imputation Server” has shown remarkable popularity and scalability with over 690,000 human genomes being imputed within one year. These services have been deployed on dedicated hardware and offer a simple interface for the specific tasks while the jobs are being executed in the MapReduce fashion. This demonstrates a positive disposition towards wider adoption of MapReduce paradigm in the bioinformatics data analysis space given accessible and effective solutions. To facilitate easy access to such MapReduce solutions for bioinformatics and broaden the availability of these services, we have extended CloudMan to provide a Hadoopbased environment with preconfigured Cloudgene. CloudMan handles the tasks of procuring required cloud resources and configuring the appropriate environment, thus insulating the user from the lowlevel technical details otherwise required. Because CloudMan is compatible with multiple cloud technologies, it is now feasible to deploy this environment on a range of private and public clouds. This makes it possible for anyone to obtain a scalable Hadoopbased cluster with Cloudgene preinstalled and readily execute MapReduce tools. This talk will present the motivation for supporting greater adoption of MapReducebased applications in the bioinformatics data analysis space followed by the details of the described services and their functionality

    eCOMPAGT – efficient Combination and Management of Phenotypes and Genotypes for Genetic Epidemiology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput genotyping and phenotyping projects of large epidemiological study populations require sophisticated laboratory information management systems. Most epidemiological studies include subject-related personal information, which needs to be handled with care by following data privacy protection guidelines. In addition, genotyping core facilities handling cooperative projects require a straightforward solution to monitor the status and financial resources of the different projects.</p> <p>Description</p> <p>We developed a database system for an efficient combination and management of phenotypes and genotypes (eCOMPAGT) deriving from genetic epidemiological studies. eCOMPAGT securely stores and manages genotype and phenotype data and enables different user modes with different rights. Special attention was drawn on the import of data deriving from TaqMan and SNPlex genotyping assays. However, the database solution is adjustable to other genotyping systems by programming additional interfaces. Further important features are the scalability of the database and an export interface to statistical software.</p> <p>Conclusion</p> <p>eCOMPAGT can store, administer and connect phenotype data with all kinds of genotype data and is available as a downloadable version at <url>http://dbis-informatik.uibk.ac.at/ecompagt</url>.</p

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we resent Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    - The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow. Document type: Conference objec

    CONAN: copy number variation analysis software for genome-wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs.</p> <p>Results</p> <p>CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data.</p> <p>Conclusions</p> <p>CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at <url>http://genepi-conan.i-med.ac.at</url>.</p

    Be X-ray Binary Outburst Zoo II

    Get PDF
    We have continued our recently started systematic study of Be X-ray binary (BeXRB) outbursts. Specifically, we are developing a catalogue of outbursts including their basic properties based on nearly all available X-ray all-sky-monitors. These properties are derived by fitting asymmetric Gaussians to the outburst lightcurves. This model describes most of the outbursts covered by our preliminary catalogue well; only 13% of all datasets show more complex outburst shapes. Analyzing the basic properties, we reveal a strong correlation between the outburst length and the reached peak flux. As an example, we discuss possible models describing the observed correlation in EXO 2030+375

    Experiences with workflows for automating data-intensive bioinformatics

    Get PDF
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

    Persistence of immunity to SARS-CoV-2 over time in the ski resort Ischgl

    Full text link
    Background In early March 2020, a SARS-CoV-2 outbreak in the ski resort Ischgl in Austria triggered the spread of SARS-CoV-2 throughout Austria and Northern Europe. In a previous study, we found that the seroprevalence in the adult population of Ischgl had reached 45% by the end of April, representing an exceptionally high level of local seropositivity in Europe. We performed a follow-up study in Ischgl, which is the first to show persistence of immunity and protection against SARS-CoV-2 and some of its variants at a community level. Methods Of the 1259 adults that participated in the baseline study, 801 have been included in the follow-up in November 2020. The study involved the analysis of binding and neutralizing antibodies and T cell responses. In addition, the incidence of SARS-CoV-2 and its variants in Ischgl was compared to the incidence in similar municipalities in Tyrol until April 2021. Findings For the 801 individuals that participated in both studies, the seroprevalence declined from 51.4% (95% confidence interval (CI) 47.9-54.9) to 45.4% (95% CI 42.0-49.0). Median antibody concentrations dropped considerably (5.345, 95% CI 4.833 - 6.123 to 2.298, 95% CI 2.141 - 2.527) but antibody avidity increased (17.02, 95% CI 16.49 - 17.94 to 42.46, 95% CI 41.06 - 46.26). Only one person had lost detectable antibodies and T cell responses. In parallel to this persistent immunity, we observed that Ischgl was relatively spared, compared to similar municipalities, from the prominent second COVID-19 wave that hit Austria in November 2020. In addition, we used sequencing data to show that the local immunity acquired from wild-type infections also helped to curb infections from variants of SARS-CoV-2 which spread in Austria since January 2021. Interpretation The relatively high level of seroprevalence (40-45%) in Ischgl persisted and might have been associated with the observed protection of Ischgl residents against virus infection during the second COVID-19 wave as well as against variant spread in 2021. Funding Funding was provided by the government of Tyrol and the FWF Austrian Science Fund

    Infektionsmedizinische und chirurgische Herausforderungen durch Carbapenem-resistente bakterielle Erreger bei der Versorgung Kriegsverletzter aus der Ukraine

    Get PDF
    Aufgrund von Hygienedefiziten und dem sehr breiten, kalkulierten Antibiotikaeinsatz bei zeit¬gleich offener Wundbehandlung in ukrainischen Militärkrankenhäusern ist das Risiko für schwerwiegende Wundinfektionen mit multiresis¬tenten Erregern (MRE) bei Übernahme ziviler Kriegsopfer hoch. Insofern kommt der Surveillance mit risikoadaptiertem Screening auf MRE, welches am Universitätsklinikum Leipzig seit 2012 durchgeführt wird, eine große Bedeutung zu. Es werden die Komplexität der Versorgung Kriegsverletzter aus der Ukraine sowie die damit einhergehenden Infektions- und Resistenzprobleme dargestellt und auf die Notwendigkeit eines interdisziplinären und -professionellen Managements hingewiesen.Peer Reviewe
    corecore