Search CORE

80 research outputs found

Bringing Hadoop into Bioinformatics with Cloudgene and CloudMan

Author: Afgan Enis
Davidović Davor
Forer Lukas
Kronenberg Florian
Schönherr Sebastian
Weissensteiner Hansi
Publication venue
Publication date: 10/07/2015
Field of study

Despite the evident potential of the MapReduce model and existence of bioinformatic algorithms and applications, those are still to become widely adopted in the bioinformatics data analysis. The Hadoop MapReduce model offers a simple framework for data parallelism by providing automated runtime recovery (for both task runtime and hardware failures), implicit scalability (tasks automatically run in parallel batch mode), as well as data replication and locality (reduce data movement, hence increase processing capacity). We identify two prerequisites for wider adoption and higher utilization of MapReduce tools: (1) abstract the technical details of how multiple existing MapReduce tools are composed, and (2) provide easy access to the necessary compute infrastructure and the appropriate environment. Satisfying these requirements would allow bioinformatics domain experts to focus on the analysis while the required technical details are hidden. At BOSC 2012, two platforms were presented: Cloudgene a MapReduce tool execution platform leveraging Hadoop, and CloudMan a cloud resource manager. Since then, we have combined and extended these two platforms to provide a readily available and an accessible Hadoopbased bioinformatics environment for the Cloud. Cloudgene, other than allowing arbitrary MapReduce tools to be integrated and used to craft an analysis, has been extended as a job execution engine for currently two dedicated services: an imputation service developed in cooperation with the Center for Statistical Genetics, University of Michigan (available at imputationserver.sph.umich.edu ) and a mtDNA analysis service (available at mtdnaserver.uibk.ac.at ). Thus far, the “Michigan Imputation Server” has shown remarkable popularity and scalability with over 690,000 human genomes being imputed within one year. These services have been deployed on dedicated hardware and offer a simple interface for the specific tasks while the jobs are being executed in the MapReduce fashion. This demonstrates a positive disposition towards wider adoption of MapReduce paradigm in the bioinformatics data analysis space given accessible and effective solutions. To facilitate easy access to such MapReduce solutions for bioinformatics and broaden the availability of these services, we have extended CloudMan to provide a Hadoopbased environment with preconfigured Cloudgene. CloudMan handles the tasks of procuring required cloud resources and configuring the appropriate environment, thus insulating the user from the lowlevel technical details otherwise required. Because CloudMan is compatible with multiple cloud technologies, it is now feasible to deploy this environment on a range of private and public clouds. This makes it possible for anyone to obtain a scalable Hadoopbased cluster with Cloudgene preinstalled and readily execute MapReduce tools. This talk will present the motivation for supporting greater adoption of MapReducebased applications in the bioinformatics data analysis space followed by the details of the described services and their functionality

Full-text Institutional Repository of the Ruđer Bošković Institute

FigShare

eCOMPAGT – efficient Combination and Management of Phenotypes and Genotypes for Genetic Epidemiology

Author: Brandstätter Anita
Coassin Stefan
Kronenberg Florian
Schönherr Sebastian
Specht Günther
Weißensteiner Hansi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background High-throughput genotyping and phenotyping projects of large epidemiological study populations require sophisticated laboratory information management systems. Most epidemiological studies include subject-related personal information, which needs to be handled with care by following data privacy protection guidelines. In addition, genotyping core facilities handling cooperative projects require a straightforward solution to monitor the status and financial resources of the different projects. Description We developed a database system for an efficient combination and management of phenotypes and genotypes (eCOMPAGT) deriving from genetic epidemiological studies. eCOMPAGT securely stores and manages genotype and phenotype data and enables different user modes with different rights. Special attention was drawn on the import of data deriving from TaqMan and SNPlex genotyping assays. However, the database solution is adjustable to other genotyping systems by programming additional interfaces. Further important features are the scalability of the database and an export interface to statistical software. Conclusion eCOMPAGT can store, administer and connect phenotype data with all kinds of genotype data and is available as a downloadable version at <url>http://dbis-informatik.uibk.ac.at/ecompagt</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

Author: Afgan Enis
Davidović Davor
Forer Lukas
Kronenberg Florian
Schönherr Sebastian
Specht Gűnter
Weißensteiner Hansi
Publication venue: Croatian Society for Information and Communication Technology, Electronics and Microelectronics - MIPRO
Publication date: 01/01/2015
Field of study

The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we resent Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

FigShare

Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

Author: Afgan Enis
Davidović Davor
Forer Lukas
Kronenberg Florian
Schönherr Sebastian
Specht Günther
Weissensteiner Hansi
Publication venue
Publication date: 24/05/2015
Field of study

- The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow. Document type: Conference objec

Scipedia

CONAN: copy number variation analysis software for genome-wide association studies

Author: Forer Lukas
Gieger Christian
Haider Florian
Kloss-Brandstätter Anita
Kluckner Thomas
Kronenberg Florian
Schönherr Sebastian
Specht Günther
Weissensteiner Hansi
Wichmann Heinz-Erich
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs. Results CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data. Conclusions CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at <url>http://genepi-conan.i-med.ac.at</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

PuSH

Be X-ray Binary Outburst Zoo II

Author: Anders Friedrich
Brand Thorsten
Falkner Sebastian
Fürst Felix
Grinberg Victoria
Kretschmar Peter
Kreykenbohm Ingo
Kühnel Matthias
Müller Sebastian
Nespoli Elisa
Okazaki Atsuo T.
Pottschmidt Katja
Schwarm Fritz-Walter
Schönherr Gabriele
Wilms Jörn
Wilson-Hodge Colleen A.
Publication venue: 'Sissa Medialab'
Publication date: 01/09/2014
Field of study

We have continued our recently started systematic study of Be X-ray binary (BeXRB) outbursts. Specifically, we are developing a catalogue of outbursts including their basic properties based on nearly all available X-ray all-sky-monitors. These properties are derived by fitting asymmetric Gaussians to the outburst lightcurves. This model describes most of the outbursts covered by our preliminary catalogue well; only 13% of all datasets show more complex outburst shapes. Analyzing the basic properties, we reveal a strong correlation between the outburst length and the reached peak flux. As an example, we discuss possible models describing the observed correlation in EXO 2030+375

Caltech Authors

Experiences with workflows for automating data-intensive bioinformatics

Author: Bongcam-Rudlof Erik
Carrasco Hernández Guillermo
Forer Lucas
Giovacchini Mario
Kallio Aleksi
Kanduła Maciej M
Korpelainen Eija
Krachunov Milko
Kreil David P.
Kulev Ognyan
Lampa Samuel
Pireddu Luca
Schönherr Sebastian
Siretskiy Alexey
Spjuth Ola
Valls Guimera Roman
Vassilev Dimitar
Łabaj Pavel P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

Springer - Publisher Connector

P-arch

PubMed Central

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

Persistence of immunity to SARS-CoV-2 over time in the ski resort Ischgl

Author: Bates Katie
Baumgartner Matthias
Borena Wegene
Bánki Zoltán
Falkensammer Barbara
Forer Lukas
Kimpel Janine
Knabl Ludwig
Paetzold Jörg
Pichler Daniel
Pipperger Lisa
Riepler Lydia
Rössler Annika
Schönherr Sebastian
Theurl Igor
Ulmer Hanno
von Laer Dorothee
Walser Andreas
Winner Hannes
Würzner Reinhard
Publication venue: 'Elsevier BV'
Publication date: 01/08/2021
Field of study

Background In early March 2020, a SARS-CoV-2 outbreak in the ski resort Ischgl in Austria triggered the spread of SARS-CoV-2 throughout Austria and Northern Europe. In a previous study, we found that the seroprevalence in the adult population of Ischgl had reached 45% by the end of April, representing an exceptionally high level of local seropositivity in Europe. We performed a follow-up study in Ischgl, which is the first to show persistence of immunity and protection against SARS-CoV-2 and some of its variants at a community level. Methods Of the 1259 adults that participated in the baseline study, 801 have been included in the follow-up in November 2020. The study involved the analysis of binding and neutralizing antibodies and T cell responses. In addition, the incidence of SARS-CoV-2 and its variants in Ischgl was compared to the incidence in similar municipalities in Tyrol until April 2021. Findings For the 801 individuals that participated in both studies, the seroprevalence declined from 51.4% (95% confidence interval (CI) 47.9-54.9) to 45.4% (95% CI 42.0-49.0). Median antibody concentrations dropped considerably (5.345, 95% CI 4.833 - 6.123 to 2.298, 95% CI 2.141 - 2.527) but antibody avidity increased (17.02, 95% CI 16.49 - 17.94 to 42.46, 95% CI 41.06 - 46.26). Only one person had lost detectable antibodies and T cell responses. In parallel to this persistent immunity, we observed that Ischgl was relatively spared, compared to similar municipalities, from the prominent second COVID-19 wave that hit Austria in November 2020. In addition, we used sequencing data to show that the local immunity acquired from wild-type infections also helped to curb infections from variants of SARS-CoV-2 which spread in Austria since January 2021. Interpretation The relatively high level of seroprevalence (40-45%) in Ischgl persisted and might have been associated with the observed protection of Ischgl residents against virus infection during the second COVID-19 wave as well as against variant spread in 2021. Funding Funding was provided by the government of Tyrol and the FWF Austrian Science Fund

Directory of Open Access Journals

ZORA

Infektionsmedizinische und chirurgische Herausforderungen durch Carbapenem-resistente bakterielle Erreger bei der Versorgung Kriegsverletzter aus der Ukraine

Author: Dietze Nadine
Fichtner Falk
Höch Andreas
Kleber Christian
Laudi Sven
Lippmann Norman
Lübbert Christoph
Notov Dmitry
Ranft Donald
Schönherr Sebastian G.
Trawinski Henning
Publication venue: Robert Koch-Institut
Publication date: 08/09/2022
Field of study

Aufgrund von Hygienedefiziten und dem sehr breiten, kalkulierten Antibiotikaeinsatz bei zeit¬gleich offener Wundbehandlung in ukrainischen Militärkrankenhäusern ist das Risiko für schwerwiegende Wundinfektionen mit multiresis¬tenten Erregern (MRE) bei Übernahme ziviler Kriegsopfer hoch. Insofern kommt der Surveillance mit risikoadaptiertem Screening auf MRE, welches am Universitätsklinikum Leipzig seit 2012 durchgeführt wird, eine große Bedeutung zu. Es werden die Komplexität der Versorgung Kriegsverletzter aus der Ukraine sowie die damit einhergehenden Infektions- und Resistenzprobleme dargestellt und auf die Notwendigkeit eines interdisziplinären und -professionellen Managements hingewiesen.Peer Reviewe

Publikationsserver des Robert Koch-Instituts