Search CORE

7 research outputs found

Overview of the Horizontal Protein Comparison Tool (HePCaT) algorithm.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

The hydropathy profiles of two hypothetical proteins, each of length M = N = 20 residues, are shown (Step 1). Intraprotein signed distances are computed within each protein according to <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e001" target="_blank">Equation 1</a> in the main text (Step 2). Positive distances, e.g. measured from a residue with a local minimum value to a residue with a local maximum value, are indicated in red, negative distances in blue. The signed distance matrices are therefore square and symmetrically reflected across the diagonal. Distances for protein 1 and protein 2 correspond to matrices D1 and D2, respectively. The similarity matrix S that ultimately compares the two proteins is constructed from the average absolute distance differences of W = 5 residue blocks between D1 and D2, according to <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e002" target="_blank">Equation 2</a> (Step 3). In S, light colored squares indicate blocks of W = 5 residues starting at residue i in protein 1 and residue j in protein 2 with similarly shaped hydropathy, dark squares indicate dissimilar shapes. (Si = 1,j = 1 is the lower left corner in the figure.) As described in the text, S is exhaustively searched and all longest alignments with up to possibly GapMax gaps, whose squares (average path distance, APD) pass a user-defined average similarity cutoff C, are kept in a list (set of colored arrows). The alignment of this list with the closest absolute shape (lowest RMSD) is defined as the optimal match (Step 5). An Optimal Path Score (OPS), defined by <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e004" target="_blank">Equation 4</a>, is assigned to the alignment and its significance is computed with respect to the score distribution of random alignments of identical length (Step 6). Note that the example alignment, while a reasonable visual match, is only marginally significant with respect to random alignments of identical length, due to its short length of 10 residues.</p

FigShare

Parameters used in Equations 6 and 7 to estimate length-dependent random protein data probability distributions based on the Inverse Chi-Squared Distribution.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

Parameters used in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e006" target="_blank">Equations 6</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e007" target="_blank">7</a> to estimate length-dependent random protein data probability distributions based on the Inverse Chi-Squared Distribution.</p

FigShare

Observed hydropathy and predicted structure similarity between ORFan C. muridarum TC0624 and bacterial colicin pore-forming domain.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

A.. Significant similarity between hydropathy of TC0624 and E. coli colicin A (SCOP domain d1cola_). The likelihood of obtaining this match by chance is p = 1.5×10−5. The blue cylinders indicate PSIPRED confidently predicted helical secondary structure of TC0624, the red cylinders indicate the actual helical secondary structure of d1cola_ domain as assessed by DSSP <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Kabsch1" target="_blank">[69]</a>. Numbers indicate the functionally important helical elements, as annotated by Cramer, et al. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Cramer1" target="_blank">[65]</a> Reasonable correspondence between the type and locations of secondary structure elements is observed. Gapped regions of colicin helices are connected with dotted lines to guide the eye. B.. Tertiary structure location of the hydrophobic similarity (left) and the sequence similarity (right) matches between TC0624 and colicin. In both molecular cartoons, helices are colored red, strands yellow, and loops green. Locations of a match between TC0624 and colicin are colored blue. The left figure is based on d1cola_, colored according to the HePCaT alignment in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g006" target="_blank">Figure 6A</a>, and the right figure is based on the homolog d1rh1a2 SCOP domain observed in the marginally significant HHPred <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Soeding2" target="_blank">[50]</a> hidden Markov model sequence match. Both matches independently link the sequence and hydrophobicity of the ORFan to the functionally important structural core region of colicin. The extensive structure, sequence, and chemical similarities suggest the medically important hypothesis that TC0624 could also be a pore-forming protein facilitating chlamydia survival.</p

FigShare

Empirically determined probability model for protein hydropathy.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

A.. Inverse Chi-Squared model for the distribution of observed scores. Distributions of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e004" target="_blank">Equation 4</a> scores for HePCaT alignments of length L = 100 obtained from parameters W = 5 residues, GapMax = 4 residues, C = 0.4. Pairs of random sequences were generated, their Kyte-Doolittle amino acid hydropathies averaged over a 15-residue window, and subjected to optimal alignment using HePCaT, as described in the text. Binned data in each case was reasonably fit to the Inverse Chi-Squared probability distribution function (PDF, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e005" target="_blank">Equation 5</a>), as described in Methods and tabulated in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t001" target="_blank">Table 1</a>. B.. Analytical parameters to estimate statistical significance. Parameters ν and σ2 for the PDF were observed to vary smoothly as a function of HePCaT alignment length, allowing the parameters, and thus alignment significance, to be analytically estimated for arbitrary alignment length using <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e006" target="_blank">Equations 6</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247.e007" target="_blank">7</a> and parameters in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t002" target="_blank">Table 2</a>. Discrete best-fit parameters for ν and σ2 are given in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-t001" target="_blank">Table 1</a>. Equations for displayed best-fit curves are as follows: y = 0.497609x (Hydropathy, ν), y = 0.160379–1.04167 ln(x+38.9045) (Hydropathy, σ2).</p

FigShare

Goodness of fit statistics between Scaled Inverse Chi Squared probability distribution function (Equation 5) and OPS score distributions of various length optimal HePCaT alignments of random amino acid sequences.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

aBlank rows for certain alignment lengths indicate that the null hypothesis (i.e. that the distribution of OPS scores for randomly generated sequences was drawn from an underlying inverse chi square distribution) was rejected at the p<0.05 level.</p

FigShare

Pairwise sequence alignment does not detect significant similarity between human A2a and Taste Receptor Type 2, Member 19, yet a similar structure can be modeled based on the HePCaT match.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

A... FASTA pairwise sequence alignment between human adenosine receptor A2a and its known homolog human adenosine receptor A2b. Alignment was extracted from a sequence search of the human proteome. Sequence similarity is 59% over 330 amino acids, with a highly significant E-value of 6.6e-53. Note that the hydropathy similarity between these two proteins is also significant, as given in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>. B... FASTA pairwise sequence alignment between human A2a and human taste receptor type 2, member 19. Sequence similarity is 21% over 305 amino acids. Although extensive, the similarity is not significant, with an E-value of 5.1e+3, in contrast to the significant hydropathy similarity displayed in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>. This result suggests that hydropathy similarity, as assessed by HePCaT, may be able to detect remote relationships in the absence of sequence similarity. C... Model of Taste Receptor Type 2, Member 19 is similar to the experimental structure of A2a. Experimental structure of A2a (left panel) is based on PDB identifier 3rey. I-TASSER <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Roy1" target="_blank">[45]</a> model of Taste Receptor Type 2, Member 19 (right panel) achieved an I-TASSER C-score of 0.67 and a DALI Z-Score <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi.1003247-Holm3" target="_blank">[46]</a> of 24.9 against the 3rey structure, indicating a confident model that is significantly similar to A2a. Rainbow colored helices follow the colors of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003247#pcbi-1003247-g004" target="_blank">Figure 4</a>, indicating the seven structurally aligned transmembrane spanning helices. The RMSD of the 269 DALI-aligned residues is 3.1 Å between modeled and experimental structures.</p

FigShare

Most significant similarities in the human proteome to the Kyte-Doolittle hydropathy profile of adenosine receptor A2a.

Author: James O. Wrabl (251131)
Omar Hadzipasic (469490)
Vincent J. Hilser (251133)
Publication venue
Publication date
Field of study

Pairwise HePCaT alignments are shown for A2a (black, gi|5921992) and the top nine most significant nonredundant hits in the human proteome. Blue color indicates known seven transmembrane spanning region proteins as annotated by the GPCRDB database, red mostly indicates hits to the tail region of A2a. The hits are shown from top to bottom in order of most to least significant: hematological and neurological expressed protein-like 1 (gi|21700763, p = 4.0×10−6), ephrin-A4 isoform a precursor (gi|4885197, p = 7.6×10−5), NSFL1 cofactor p47 isoform a (gi|20149635, p = 9.1×10−5), metallothionein-1E (gi|83367075, p = 9.7×10−5), taste receptor type 2 member 19 (gi|28882035, p = 4.1×10−4), B- and T-lymphocyte attenuator isoform 1 precursor (gi|145580621, p = 5.4×10−4), WD-repeat domain-containing protein 83 (gi|153791298, p = 6.5×10−4), dual specificity protein phosphatase 26 (gi|13128968, p = 7.7×10−4), adenosine receptor A2b (gi|4501951, p = 8.3×10−4). Thick lines indicate residue positions included in the optimal HePCaT alignment to A2a, and thin lines indicate unaligned positions. Rainbow colored cylinders from N- to C-terminus indicate the approximate sequence locations of the seven experimentally determined transmembrane spanning helices of A2a.</p

FigShare

Overview of the Horizontal Protein Comparison Tool (<i>HePCaT</i>) algorithm.

Parameters used in Equations 6 and 7 to estimate length-dependent random protein data probability distributions based on the Inverse Chi-Squared Distribution.

Observed hydropathy and predicted structure similarity between ORFan <i>C. muridarum TC0624</i> and bacterial colicin pore-forming domain.

Empirically determined probability model for protein hydropathy.

Goodness of fit statistics between Scaled Inverse Chi Squared probability distribution function (Equation 5) and <i>OPS</i> score distributions of various length optimal <i>HePCaT</i> alignments of random amino acid sequences.

Pairwise sequence alignment does not detect significant similarity between human A2a and Taste Receptor Type 2, Member 19, yet a similar structure can be modeled based on the <i>HePCaT</i> match.

Most significant similarities in the human proteome to the Kyte-Doolittle hydropathy profile of adenosine receptor A2a.