Article thumbnail

Better scoring schemes for the recognition of functional proteins by protomata

By Manon Ruffini

Abstract

Proteins perform very important functions within organisms. Predicting thesefunctions is a major problem in biology. To address this issue, predictive models of functionalfamilies from the sequences of amino acids that form the proteins have been developed. TheDyliss team developed a machine learning algorithm, named Protomata-learner, that learnsweighted automata representing these families and the possible disjunctions between members.New sequences can be compared to these models and assigned a score to predict their belongingto the family.Despite good results, the sequence weighting strategy and the null-models in Protomata arerather basic. During my internship, I investigated alternative sequence weighting strategies andnull-models. Besides, the expressivity of Protomata leads to a great variability of scores and thechoice of the classification threshold was left to the user. So, I proposed a normalization of thescore, and a method to assess the significance of scores, to simplify the prediction. I implementedthese new strategies and compared them on several data sets. Preliminary results show a goodimprovement of the prediction power of the computed models

Topics: null-model, significance, proteins, statistical modelling, automata, Dirichlet mixture, sequence weighting, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Publisher: HAL CCSD
Year: 2017
OAI identifier: oai:HAL:hal-01557941v1

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles