Abstract. It is estimated that 20 % of genes in the human genome code for integral membrane proteins(IMPs) and some estimates are much higher. IMPs control a broad range of events essential to the proper functioning of cells, tissues and organisms. IMPs include the most common targets of clinically useful drugs, such as the G protein coupled receptors (GPCR), the target for more than 50 % of prescription drugs . However there is a dearth of high-resolution 3D structural information on the IMPs. The number of the IMPs depositions in the major structural holding, the Protein Data Bank is less than 0.4 % of the collection . Therefore good prediction methods of IMPs structures are to be highly valued. In this paper we apply Conditional Random Fields (CRFs) to build a probabilistic model to segment and label sequence data to solve the membrane protein helix prediction problem. The advantage of a CRFs is that it allows seamless integration of biological domain knowledge into the model. Our results show that the CRF model outperforms other well known helix prediction approaches on several important measures.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.