5 research outputs found

    A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets

    Get PDF
    DNA sequencing has moved into the realm of Big Data due to the rapid development of high-throughput, low cost Next-Generation Sequencing (NGS) technologies. Sequential data compression solutions that once were sufficient to efficiently store and distribute this information are now falling behind. In this paper we introduce phyNGSC, a hybrid MPI-OpenMP strategy to speedup the compression of big NGS data by combining the features of both distributed and shared memory architectures. Our algorithm balances work-load among processes and threads, alleviates memory latency by exploiting locality, and accelerates I/O by reducing excessive read/write operations and inter-node message exchange. To make the algorithm scalable, we introduce a novel timestamp-based file structure that allows us to write the compressed data in a distributed and non-deterministic fashion while retaining the capability of reconstructing the dataset with its original order. Our experimental results show that phyNGSC achieved compression times for big NGS datasets that were 45% to 98% faster than NGS-specific sequential compressors with throughputs of up to 3GB/s. Our theoretical analysis and experimental results suggest strong scalability with some datasets yielding super-linear speedups and constant efficiency. We were able to compress 1 terabyte of data in under 8 minutes compared to more than 5 hours taken by NGS-specific compression algorithms running sequentially. Compared to other parallel solutions, phyNGSC achieved up to 6x speedups while maintaining a higher compression ratio. The code for this implementation is available at https://github.com/pcdslab/PHYNGS

    Scalable Data Structure to Compress Next-Generation Sequencing Files and its Application to Compressive Genomics

    Get PDF
    It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso data processing. Using our data structure compression and decompression throughputs of up to 8.71 GB/s and 10.12 GB/s were observed. Our proposed structure and methodology brings us one step closer to compressive genomics and sublinear analysis of big NGS datasets. The code for this implementation is available at https://github.com/pcdslab/PHYNGS

    A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets

    Get PDF
    With the advent of high-throughput next-generation sequencing (NGS) techniques, the amount of data being generated represents challenges including storage, analysis and transport of huge datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. However, these specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. In this paper we introduce paraDSRC, a parallel implementation of DSRC algorithm using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p ). Our experimental results show that paraDSRC achieves compression times that are 43% to 99% faster than DSRC and compression throughputs of up to 8.4GB/s on a moderate size cluster. For many of the datasets used in our experiments super-linear speedups have been registered, making the implementation strongly scalable. We also show that paraDSRC is more than 25.6x faster than comparable parallel compression algorithms. The code will be available in author’s website if paper is accepted

    A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets

    No full text

    Description of Symptoms Caused by the Infection of the SARS-CoV-2 B.1.621 (Mu) Variant in Patients With Complete CoronaVac Vaccination Scheme : First Case Report From Santiago of Chile

    Get PDF
    Vaccine administration is one of the most efficient ways to control the current coronavirus disease 2019 (COVID-19) pandemic. However, the appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants can avoid the immunity generated by vaccines. Thus, in patients with a complete vaccine schedule, the infection by SARS-CoV-2 may cause severe, mild, and asymptomatic manifestations of the disease. In this case report, we describe for the first time the clinical symptoms of four patients (three symptomatic; one asymptomatic) from Santiago of Chile, with a complete vaccination schedule with two doses of CoronaVac (Sinovac Life Science) infected with the variant of interest (VOI) B.1.621 (Mu). They were compared with four unvaccinated patients, who had a higher prevalence of symptoms after infection compared to vaccinated patients. In the CoronaVac-vaccinated group, an 80-year-old patient who registered various comorbidities required Invasive mechanical ventilation for 28 days with current home medical recovery discharge. By contrast, in the unvaccinated group, a 71-year-old presented more symptoms with more than 45 days of Invasive mechanical ventilation, which continues to date, presenting greater lung damage than the vaccinated hospitalized patient. This first report evidence differences in the clinical symptomatology of patients vaccinated and non-vaccinated infected with the VOI B.1.621 (Mu) and suggest the protective effects of CoronaVac against this variant
    corecore