Categorizing Host-Dependent RNA Viruses by Principal Component Analysis of Their Codon Usage Preferences

Abstract

ABSTRACT Viruses have to exploit host transcription and translation mechanisms to replicate in a hostile host cellular environment, and therefore, it is likely that the infected host may impose pressure on viral evolution. In this study, we investigated differences in codon usage preferences among the highly mutable single strain RNA viruses which infect vertebrate or invertebrate hosts, respectively. We incorporate principal component analysis (PCA) and k-mean methods to clustering viruses infected with different type of hosts. The relative synonymous codon usage (RSCU) indices of all genes in 32 RNA viruses were calculated, and the correlation of the RSCU indices among different viruses was analyzed by the PCA. Our results show a positive correlation in codon usage preferences among viruses that target the same host category. Results of k-means clustering analysis further confirmed the statistical significance of this study, demonstrating that viruses infecting vertebrate hosts have different codon usage preferences to those of invertebrate viruses. Based on the analysis of the effective number of codons (ENC) in relation to the GC-content at the synonymous third codon position (GC3s), we further identified that mutational pressure was the dominant evolution driving force in making the different codon usage preferences. This study suggests a new and effective way to characterize host-dependent RNA viruses based on the codon usage pattern

    Similar works