7 research outputs found

    Overview of the Horizontal Protein Comparison Tool (<i>HePCaT</i>) algorithm.

    No full text
    <p>The hydropathy profiles of two hypothetical proteins, each of length <i>M</i>β€Š=β€Š<i>N</i>β€Š=β€Š20 residues, are shown (Step 1). Intraprotein signed distances are computed within each protein according to <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e001" target="_blank">Equation 1</a> in the main text (Step 2). Positive distances, <i>e.g</i>. measured from a residue with a local minimum value to a residue with a local maximum value, are indicated in red, negative distances in blue. The signed distance matrices are therefore square and symmetrically reflected across the diagonal. Distances for protein 1 and protein 2 correspond to matrices <b><i>D<sub>1</sub></i></b> and <b><i>D<sub>2</sub></i></b>, respectively. The similarity matrix <b><i>S</i></b> that ultimately compares the two proteins is constructed from the average absolute distance differences of <i>W</i>β€Š=β€Š5 residue blocks between <b><i>D<sub>1</sub></i></b> and <b><i>D<sub>2</sub></i></b>, according to <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e002" target="_blank">Equation 2</a> (Step 3). In <b><i>S</i></b>, light colored squares indicate blocks of <i>W</i>β€Š=β€Š5 residues starting at residue <i>i</i> in protein 1 and residue <i>j</i> in protein 2 with similarly shaped hydropathy, dark squares indicate dissimilar shapes. (<b><i>S</i></b><i><sub>iβ€Š=β€Š1,jβ€Š=β€Š1</sub></i> is the lower left corner in the figure.) As described in the text, <b><i>S</i></b> is exhaustively searched and all longest alignments with up to possibly <i>GapMax</i> gaps, whose squares (average path distance, <i>APD</i>) pass a user-defined average similarity cutoff <i>C</i>, are kept in a list (set of colored arrows). The alignment of this list with the closest absolute shape (lowest <i>RMSD</i>) is defined as the optimal match (Step 5). An Optimal Path Score (<i>OPS</i>), defined by <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e004" target="_blank">Equation 4</a>, is assigned to the alignment and its significance is computed with respect to the score distribution of random alignments of identical length (Step 6). Note that the example alignment, while a reasonable visual match, is only marginally significant with respect to random alignments of identical length, due to its short length of 10 residues.</p

    Parameters used in Equations 6 and 7 to estimate length-dependent random protein data probability distributions based on the Inverse Chi-Squared Distribution.

    No full text
    <p>Parameters used in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e006" target="_blank">Equations 6</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e007" target="_blank">7</a> to estimate length-dependent random protein data probability distributions based on the Inverse Chi-Squared Distribution.</p

    Observed hydropathy and predicted structure similarity between ORFan <i>C. muridarum TC0624</i> and bacterial colicin pore-forming domain.

    No full text
    <p><b>A.. </b><b>Significant similarity between hydropathy of <i>TC0624</i> and <i>E. coli</i> colicin A (SCOP domain d1cola_).</b> The likelihood of obtaining this match by chance is <i>p</i>β€Š=β€Š1.5Γ—10<sup>βˆ’5</sup>. The blue cylinders indicate PSIPRED confidently predicted helical secondary structure of TC0624, the red cylinders indicate the actual helical secondary structure of d1cola_ domain as assessed by DSSP <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Kabsch1" target="_blank">[69]</a>. Numbers indicate the functionally important helical elements, as annotated by Cramer, et al. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Cramer1" target="_blank">[65]</a> Reasonable correspondence between the type and locations of secondary structure elements is observed. Gapped regions of colicin helices are connected with dotted lines to guide the eye. <b>B.. </b><b>Tertiary structure location of the hydrophobic similarity (left) and the sequence similarity (right) matches between </b><b><i>TC0624</i></b><b> and colicin.</b> In both molecular cartoons, helices are colored red, strands yellow, and loops green. Locations of a match between <i>TC0624</i> and colicin are colored blue. The left figure is based on d1cola_, colored according to the <i>HePCaT</i> alignment in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g006" target="_blank">Figure 6A</a>, and the right figure is based on the homolog d1rh1a2 SCOP domain observed in the marginally significant <i>HHPred </i><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Soeding2" target="_blank">[50]</a> hidden Markov model sequence match. Both matches independently link the sequence and hydrophobicity of the ORFan to the functionally important structural core region of colicin. The extensive structure, sequence, and chemical similarities suggest the medically important hypothesis that <i>TC0624</i> could also be a pore-forming protein facilitating chlamydia survival.</p

    Empirically determined probability model for protein hydropathy.

    No full text
    <p><b>A.. </b><b>Inverse Chi-Squared model for the distribution of observed scores.</b> Distributions of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e004" target="_blank">Equation 4</a> scores for <i>HePCaT</i> alignments of length <i>L</i>β€Š=β€Š100 obtained from parameters <i>W</i>β€Š=β€Š5 residues, <i>GapMax</i>β€Š=β€Š4 residues, <i>C</i>β€Š=β€Š0.4. Pairs of random sequences were generated, their Kyte-Doolittle amino acid hydropathies averaged over a 15-residue window, and subjected to optimal alignment using <i>HePCaT</i>, as described in the text. Binned data in each case was reasonably fit to the Inverse Chi-Squared probability distribution function (PDF, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e005" target="_blank">Equation 5</a>), as described in Methods and tabulated in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t001" target="_blank">Table 1</a>. <b>B.. </b><b>Analytical parameters to estimate statistical significance.</b> Parameters <i>Ξ½</i> and <i>Οƒ<sup>2</sup></i> for the PDF were observed to vary smoothly as a function of <i>HePCaT</i> alignment length, allowing the parameters, and thus alignment significance, to be analytically estimated for arbitrary alignment length using <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e006" target="_blank">Equations 6</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e007" target="_blank">7</a> and parameters in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t002" target="_blank">Table 2</a>. Discrete best-fit parameters for <i>Ξ½</i> and <i>Οƒ<sup>2</sup></i> are given in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t001" target="_blank">Table 1</a>. Equations for displayed best-fit curves are as follows: yβ€Š=β€Š0.497609x (Hydropathy, <i>Ξ½</i>), yβ€Š=β€Š0.160379–1.04167 ln(x+38.9045) (Hydropathy, <i>Οƒ<sup>2</sup></i>).</p

    Goodness of fit statistics between Scaled Inverse Chi Squared probability distribution function (Equation 5) and <i>OPS</i> score distributions of various length optimal <i>HePCaT</i> alignments of random amino acid sequences.

    No full text
    a<p>Blank rows for certain alignment lengths indicate that the null hypothesis (<i>i.e.</i> that the distribution of <i>OPS</i> scores for randomly generated sequences was drawn from an underlying inverse chi square distribution) was rejected at the <i>p</i><0.05 level.</p

    Pairwise sequence alignment does not detect significant similarity between human A2a and Taste Receptor Type 2, Member 19, yet a similar structure can be modeled based on the <i>HePCaT</i> match.

    No full text
    <p><b>A... </b><b><i>FASTA</i> pairwise sequence alignment between human adenosine receptor A2a and its known homolog human adenosine receptor A2b.</b> Alignment was extracted from a sequence search of the human proteome. Sequence similarity is 59% over 330 amino acids, with a highly significant E-value of 6.6e-53. Note that the hydropathy similarity between these two proteins is also significant, as given in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>. <b>B... </b><b><i>FASTA</i></b><b> pairwise sequence alignment between human A2a and human taste receptor type 2, member 19.</b> Sequence similarity is 21% over 305 amino acids. Although extensive, the similarity is not significant, with an E-value of 5.1e+3, in contrast to the significant hydropathy similarity displayed in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>. This result suggests that hydropathy similarity, as assessed by <i>HePCaT</i>, may be able to detect remote relationships in the absence of sequence similarity. <b>C... </b><b>Model of Taste Receptor Type 2, Member 19 is similar to the experimental structure of A2a.</b> Experimental structure of A2a (left panel) is based on PDB identifier 3rey. I-TASSER <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Roy1" target="_blank">[45]</a> model of Taste Receptor Type 2, Member 19 (right panel) achieved an I-TASSER C-score of 0.67 and a DALI Z-Score <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Holm3" target="_blank">[46]</a> of 24.9 against the 3rey structure, indicating a confident model that is significantly similar to A2a. Rainbow colored helices follow the colors of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>, indicating the seven structurally aligned transmembrane spanning helices. The RMSD of the 269 DALI-aligned residues is 3.1 Γ… between modeled and experimental structures.</p

    Most significant similarities in the human proteome to the Kyte-Doolittle hydropathy profile of adenosine receptor A2a.

    No full text
    <p>Pairwise <i>HePCaT</i> alignments are shown for A2a (black, gi|5921992) and the top nine most significant nonredundant hits in the human proteome. Blue color indicates known seven transmembrane spanning region proteins as annotated by the <i>GPCRDB</i> database, red mostly indicates hits to the tail region of A2a. The hits are shown from top to bottom in order of most to least significant: hematological and neurological expressed protein-like 1 (gi|21700763, <i>p</i>β€Š=β€Š4.0Γ—10<sup>βˆ’6</sup>), ephrin-A4 isoform a precursor (gi|4885197, <i>p</i>β€Š=β€Š7.6Γ—10<sup>βˆ’5</sup>), NSFL1 cofactor p47 isoform a (gi|20149635, <i>p</i>β€Š=β€Š9.1Γ—10<sup>βˆ’5</sup>), metallothionein-1E (gi|83367075, <i>p</i>β€Š=β€Š9.7Γ—10<sup>βˆ’5</sup>), taste receptor type 2 member 19 (gi|28882035, <i>p</i>β€Š=β€Š4.1Γ—10<sup>βˆ’4</sup>), B- and T-lymphocyte attenuator isoform 1 precursor (gi|145580621, <i>p</i>β€Š=β€Š5.4Γ—10<sup>βˆ’4</sup>), WD-repeat domain-containing protein 83 (gi|153791298, <i>p</i>β€Š=β€Š6.5Γ—10<sup>βˆ’4</sup>), dual specificity protein phosphatase 26 (gi|13128968, <i>p</i>β€Š=β€Š7.7Γ—10<sup>βˆ’4</sup>), adenosine receptor A2b (gi|4501951, <i>p</i>β€Š=β€Š8.3Γ—10<sup>βˆ’4</sup>). Thick lines indicate residue positions included in the optimal <i>HePCaT</i> alignment to A2a, and thin lines indicate unaligned positions. Rainbow colored cylinders from N- to C-terminus indicate the approximate sequence locations of the seven experimentally determined transmembrane spanning helices of A2a.</p
    corecore