40 research outputs found

    Top 10 ranking non-pathogenic protein families and annotated functions of their proteins for TM-Gammaproteobacteria model.

    No full text
    <p>P and N columns contain the number of pathogenic and non-pathogenic organisms in the protein family respectively.</p

    Top 10 ranking non-pathogenic protein families and annotated functions of their proteins for the WDM model.

    No full text
    <p>P and N columns contain the number of pathogenic and non-pathogenic organisms in the protein family respectively.</p

    P<i><sub>ratio</sub></i> and Z-score histograms for TM-Betaproteobacteria model.

    No full text
    <p>The model was built setting MinOrg = 2, HT = 0.9 and LT = 0.3. (A) and (B) respectively show the P<i><sub>ratio</sub></i> and Z-score histograms for the clusters i such that ORG<i><sub>i</sub></i>≥MinOrg. By this step the original 69,744 clusters are reduced to 26,706. In (A) the bars at the extremes are the count for clusters containing either only genes from pathogenic organisms (right bar) and non-pathogenic ones (left bar), while the small pick in the middle are clusters containing the same number of pathogenic and non-pathogenic organisms, and hence will not be used since they provide no discriminative information about pathogenicity. (C) and (D) show the same histograms for the PFs obtained removing all the significant clusters with P<i><sub>ratio</sub></i> value between LT and HT. We can see how the amount of non-pathogenic PFs is higher than the pathogenic ones (C). HT and LT can be used to modify the amount of both pathogenic and non-pathogenic PFs, which can be useful in model in which the training-set has an unbalanced amount of pathogenic and non-pathogenic organisms. In (D) the negative Z-scores are associated with non-pathogenic families while the others are for pathogenic PFs.</p

    PFDB, training and test-set for each model.

    No full text
    <p>Each bar-plot shows the percentage of pathogenic (orange) and non-pathogenic (light-blue) organisms in the training and test-set, and the percentage of pathogenic and non-pathogenic protein families in the PFDB of the model identified by the title of the bar-plot (eg. WMD). Below each horizontal bar-plot the number of protein families composing the PFDB of the model the bar-plot refers to, along with its size in megabytes and the number of sequences, is shown.</p

    Top 10 ranking pathogenic protein families and annotated functions of their proteins for TM-Gammaproteobacteria model.

    No full text
    <p>P and N columns contain the number of pathogenic and non-pathogenic organisms in the protein family respectively.</p

    Training, test data and model parameters.

    No full text
    <p>Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. <i>Zthr</i> is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.</p

    Top 10 ranking pathogenic protein families and annotated functions of their proteins for the WDM model.

    No full text
    <p>P and N columns contain the number of pathogenic and non-pathogenic organisms in the protein family respectively.</p

    MCC on cross validation and independent test-set.

    No full text
    <p>Column 2, the MCC obtained in the 5-fold cross validation (CV) by each of the 10 models. Column 3, the MCC of the individual TM models and the COMPL model (last line) when tested on independent test data from the corresponding phyla/classis. Column 4, the MCC of the WDM model when tested on independent test data from specific phyla/classis.</p>1<p>Organisms of phylum/class for which no TM model is available were tested using COMPL model. COMPL was trained on all organisms from classes or phyla for which only either pathogenic or non-pathogenic strains were available.</p>2<p>MCC for WDM on the same test-set used for COMPL.</p>3<p>Overall MCC for all the TM models and the COMPL model.</p

    Top 10 ranking non-pathogenic protein families and annotated functions of their proteins for the WDM model.

    No full text
    <p>P and N columns contain the number of pathogenic and non-pathogenic organisms in the protein family respectively.</p

    Helical wheel projection of the original Cap18 peptide and Cap18 derivatives.

    No full text
    <p>Hydrophobic amino acids are yellow, negatively charged amino acids are red and positively charged amino acids are in dark blue. Particular polar residues are violet (threonine, serine) or pink (asparagine, glutamine). Glycine and alanine are grey and proline residues are shown in green. The helices were created using <a href="http://heliquest.ipmc.cnrs.fr/" target="_blank">http://heliquest.ipmc.cnrs.fr/</a> [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0197742#pone.0197742.ref032" target="_blank">32</a>]. A: helical wheel projection of the original Cap18 peptide. Residues important for the hemolytic activity and antimicrobial activity of Cap18 are highlighted. B-E: helical wheel projections of Cap18 derivatives that lost the antimicrobial activity against all the tested organisms. Peptide 3 harboring the I13D amino acid substitution (B), Peptide 16 harboring the L17D substitution (C), Peptide 18 harboring the L17P substitution (D) and Peptide 26 harboring the I24D substitution (E). Corresponding substitutions are highlighted.</p
    corecore