25 research outputs found

    Phylogenetic analysis of HCV quasispecies derived from HCV-infected subjects.

    No full text
    <p>391 independently-obtained HCV E2 nucleic acid sequences derived from HCV-infected subjects were analyzed using the neighbor-joining method, as described under <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0026981#s4" target="_blank">Materials and Methods</a>. 500 bootstrap re-samplings were performed to ascertain tree topology. Scale bar represents 0.1 nucleotide substitution per site.</p

    HCV E2 amino-acid sequence variability in HCV quasispecies derived from HCV-infected subjects.

    No full text
    <p>A. Consensus E2 amino-acid sequences were determined in 17 HCV-infected subjects and in the HCV-1a infected serum donor from ref. 14 based on the identity of the most frequent amino-acid residue at each position. 1: R or H; 2: V or I; 3: A or T; 4: R or Q. B. Variability at each amino acid position was computed using the Entropy-ONE Web tool <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0026981#pone.0026981-Korber1" target="_blank">[58]</a>. C. Amino-acid segments that were shown to be important for binding of the AR3B antibody <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0026981#pone.0026981-Law1" target="_blank">[14]</a>.</p

    Dendrogram of E2 structure clustering in HCV quasispecies derived from HCV-infected subjects.

    No full text
    <p>A. Analysis of the structural distance (RMSD) matrix of modelled E2 structures from HCV-infected subjects (n = 391) using the neighbour-joining algorithm showed a clustering of genotype 1 variants (boxed), with the exception of singular outlier structures (circled) and genotype 3a variants. B. Subtype 1a and 1b clustered separately but were structurally similar. E2 structures derived from patients infected with the same HCV subtype formed distinct clusters.</p

    Putative binding site of the AR3B monoclonal antibody on E2.

    No full text
    <p>A. Structural analysis of the proposed AR3B binding site on E2 (orange, red, and green) revealed that it would bury between 1092 and 1794 Å<sup>2</sup> at the surface of E2. Regions 396–424 (red) and 523–540 (orange) are closely associated compared with region 436–447 (green). B. Analysis of the 396–424 and 523–540 regions (magnified) showed that critical residues involved in AR3B binding (Ser 424, Gly 530, Asp 535 and Val 538) were largely surface-exposed and lied in close proximity to one another.</p

    Genomic sequence dataset of SARS-CoV-2 variants.

    No full text
    Machine learning was shown to be effective at identifying distinctive genomic signatures among viral sequences. These signatures are defined as pervasive motifs in the viral genome that allow discrimination between species or variants. In the context of SARS-CoV-2, the identification of these signatures can assist in taxonomic and phylogenetic studies, improve in the recognition and definition of emerging variants, and aid in the characterization of functional properties of polymorphic gene products. In this paper, we assess KEVOLVE, an approach based on a genetic algorithm with a machine-learning kernel, to identify multiple genomic signatures based on minimal sets of k-mers. In a comparative study, in which we analyzed large SARS-CoV-2 genome dataset, KEVOLVE was more effective at identifying variant-discriminative signatures than several gold-standard statistical tools. Subsequently, these signatures were characterized using a new extension of KEVOLVE (KANALYZER) to highlight variations of the discriminative signatures among different classes of variants, their genomic location, and the mutations involved. The majority of identified signatures were associated with known mutations among the different variants, in terms of functional and pathological impact based on available literature. Here we showed that KEVOLVE is a robust machine learning approach to identify discriminative signatures among SARS-CoV-2 variants, which are frequently also biologically relevant, while bypassing multiple sequence alignments. The source code of the method and additional resources are available at: https://github.com/bioinfoUQAM/KEVOLVE.</div

    SARS-CoV-2 genome organization.

    No full text
    Four structural proteins (red), 16 non-structural proteins (NSPs; blue), and 9 accessory factors (green) are shown. ORFs (open reading frames; yellow) 1a and 1b encode polyproteins. The protein sequence similarity with SARS-CoV homologues (when homologues exist) is depicted by the color intensity.</p

    Results of the comparative study.

    No full text
    A-C) The violin plots illustrate the distributions of the performance metrics, including Precision, Recall, and F1-score, obtained for the test set predictions during the cross-validation evaluation of 100 iterations. D) The bar plot depicts the average number of motifs identified by each approach to build their prediction model. The black vertical bar indicates the standard deviation.</p

    Results of the comparative study.

    No full text
    A-E) The confusion matrices represent the average prediction performance as a function of the different variants for each tool over the 100 iterations. Each cell shows the average percentage of the assigned instance in the top value, and the standard deviation in the bottom value.</p
    corecore