vii, 22 p.FASTQ files are a common file structure used to store DNA sequence information and the quality scores for the associated DNA reads. These files are stored in a plain text format, so it is advantageous to compress these files to minimize the storage used. Dr. Sandino Vargas- Pérez performed previous research to develop a hybrid parallel algorithm called phyNGSC. This algorithm allows for fast compression and decompression of FASTQ files using distributed and shared memory programming models utilizing the MPI and OpenMP libraries. This research expands on this algorithm by introducing in compresso, an approach using hybrid parallelism with MPI and OpenMP that allows for the processing of genetic data without the need for total data decompression. This technique greatly reduces the storage needed to work with these genetic datasets and offers significant performance increases relative to comparable algorithms. This SIP showcases in compresso data processing for DNA sequence pattern finding and FASTQ to FASTA format conversion. The code for this implementation is available at [1]
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.