A Multi-Science Data Analysis Platform and the GeneROOT Use Case

Abstract

This talk will cover two areas of current research in the context of knowledge sharing between CERN openlab and the life science communities. The first area covers the development and prototyping of a multi-science data analysis platform build up around CERN developed technologies like, Zenodo, REANA and CVMFS. When finished this platform will support a complete data analysis life-cycle from data discovery, to data access, to data processing to end-user data analysis. The second area covers a specific use case, where HEP specific software like ROOT is used to store and process genomics data sequences. There are a number of handcrafted genomics data formats being used, like FASTQ, SAM, BAM, CRAM, etc. They range from pure ASCII to compressed binary formats. We will compare the features of these formats with the generic capabilities of ROOT’s TTree containers. Also we will show performance numbers of typical analysis scenarios

    Similar works

    Full text

    thumbnail-image

    Available Versions