De novo characterization of skeletal muscle transcriptome of Atlantic herring (Clupea harengus) using Next Generation Sequencing NGS) : effect of quality and length trimming on transcriptome assembly

Abstract

Atlantic herring (Clupea harengus), one of the most abundant fish species on earth is an economically important marine species that is found in the Baltic Sea and on both sides of the Atlantic Ocean. Although it has been a popular species for marine fish population studies since long, yet the genomic information for Atlantic herring is scarce. Recent developments in ultra high throughput RNA sequencing methods has allowed rapid and cost effective generation of large sequence information, which can be used to characterize the transcriptome in any non-model species even when no reference sequence is available. Transcriptome sequencing from the skeletal muscle of a single specimen of Atlantic herring was performed using Illumina HiSeq 2000 platform that generated approximately 116 million reads (with 101 bp length). These short reads were trimmed for quality and were assembled into 115,046 contigs with an average length of 291 bp and N50 of 375 bp, thereby producing a draft transcriptome assembly with total size of 33.51 Mb. With the e-value threshold set to 10-4, 46,979 contigs (40.84%) were identified to have matches against GenBank non-redundant (NR) proteins and Zebrafish unigenes database. Using the annotated transcriptome resource, 25,431 putative allelic variants (24,351 SNPs and 1080 indels) were identified. The present study provides a comprehensive muscle transcriptome resource which will be particularly useful for the validation of draft genome assembly of Atlantic herring that is currently being established within our group

    Similar works