Isoelectric point prediction from the amino acid sequence of a protein

Abstract

Proteins often do not migrate as expected in two dimensional electrophoresis based on their primary sequence. The predicted isoelectric point (pI) frequently does not coincide with experimental pI values obtained in the laboratory. The reasons for these differences led to this study. Initially, 2DE data from the E. coli proteome was collected and formatted. This dataset was split into three parts each consisting of different levels of pI discrepancy (ΔpI). The protein sequence data for each ΔpI subset was run through a pipeline. At each stage of the pipeline the data were analyzed by comparing each of the three ΔpI subsets to one another. The pipeline consisted of a naïve approach (considering individual amino acid frequencies), followed by the application four different alphabets to represent sequences in a simpler way by grouping similar amino acids based on their charge, functional, chemical, and hydrophobic properties . The final step in the pipeline involved investigating the dipeptides of all of these sequences using both the 20 amino acid alphabet and the simplified groupings. An evaluation of the alphabet dipeptide analysis demonstrated the existence of certain dipeptide sequences which correlate well with differences between predicted pI and experimental pI

    Similar works