7 research outputs found
SNP.vcf
Clean reads were firstly mapped to transcripts using Bowtie2, then SNPs were called SNPs using SAMtools. Raw SNPs with a minimum depth of 4 and minimum quality of 20 were filtered out using Vcftools (Danecek et al., 2011), and SNPs clustered within 50 bp were also filtered out. SNPs were annotated using snpEFF(http://snpeff.sourceforge.net/
Trinotate annotation
The taimen transcirptome was annotated using Trinotate(https://trinotate.github.io/) according to the guidance. NR, Uniprot-Sprot and Pfam databases were used
eggNOG annotation
The COG functional category annotation using eggNOG-mapper (Huerta-Cepas et al., 2017), 72,605 putative proteins were annotated
KEGG annotation
A KEGG pathway analysis was performed using GhostKOALA . A total of 51,698 transcripts were assigned to 8,052 KEGG ortholog group
Gene Ontology annotation
The sequences with significant hits in the Uniprot database or Pfam database were assigned GO terms using the Trinotate package, and the GO terms were assigned using Interproscan.72,728 transcripts were assigned to 15,107 GO terms, including 10,185 biological process terms, 1,429 cellular component terms and 3,493 molecular function terms
Assembly transcriptome
The transcriptome sequences were assembled using the Trinity package. Before assembly, low-quality reads were filtered from the raw reads using Trimmomatic with the parameters LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:50. The clean reads from the two pooled libraries were merged and in silico normalized using the Trinity package with default parameters to reduce the running time and memory consumption. A parameter kmer size of 25 and a depth of at least two kmer were used for assembly with the Trinity package. The contigs resulting from Trinity were further fed to the TGI clustering Tool (version 2.1) to process alternative splicing and redundant sequences.The raw RNA-Seq reads and assembled transcripts were deposited in the European Nucleotide Archive under the project ID PRJEB19675 and accession numbers HAGJ01000001 to HAGJ01190473 for the assembled transcripts
Sequences of index and primers
This pack contains 4 files, "forward_index.txt" and "reverse_index" are index sequences for demultiplexing reads to samples, and "primers.txt" was primer sequences for classifying reads to loci, and "sample_config.txt" is index config for samples. These files were used to genotype 32 taimen samples which collected from the Hutou section of the Wusuli River (E133˚40´17″, N45˚58´50˝) . The raw reads sequenced with Illumina HiSeq2500 platform in 250 Pair-End mode were deposited to in the European Nucleotide Archive under the project ID PRJEB19675 with accession number ERR2029723