3 research outputs found

    Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses

    Get PDF
    Background Clustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results. Results We presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92–1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty. Conclusions We formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences

    Transcriptomic comparison of invasive bigheaded carps (\u3ci\u3eHypophthalmichthys nobilis\u3c/i\u3e and \u3ci\u3eHypophthalmichthys molitrix\u3c/i\u3e) and their hybrids

    Get PDF
    Bighead carp (Hypophthalmichthys nobilis) and silver carp (Hypophthalmichthys molitrix), collectively called bigheaded carps, are invasive species in the Mississippi River Basin (MRB). Interspecific hybridization between bigheaded carps has been considered rare within their native rivers in China; however, it is prevalent in the MRB. We conducted de novo transcriptome analysis of pure and hybrid bigheaded carps and obtained 40,759 to 51,706 transcripts for pure, F1 hybrid, and backcross bigheaded carps. The search against protein databases resulted in 20,336–28,133 annotated transcripts (over 50% of the transcriptome) with over 13,000 transcripts mapped to 23 Gene Ontology biological processes and 127 KEGG metabolic pathways. More transcripts were detected in silver carp than in bighead carp; however, comparable numbers of transcripts were annotated. Transcriptomic variation detected between two F1 hybrids may indicate a potential loss of fitness in hybrids. The neighbor-joining distance tree constructed using over 2,500 one-to-one orthologous sequences suggests transcriptomes could be used to infer the history of introgression and hybridization. Moreover, we detected 24,792 candidate SNPs that can be used to identify different species. The transcriptomes, orthologous sequences, and candidate SNPs obtained in this study should provide further knowledge of interspecific hybridization and introgression
    corecore