An EM algorithm for mapping short reads in multiple RNA structure probing experiments

Abstract

International audienceAn accurate mapping of reads against the sequence of reference is the first step to grant a good NGS data analysis.However, when mapping is about assigning reads to a set of RNA variants, in the case of simultaneous sequencing,the task become hard to handle. Many algorithms have been developed to overcome the issue of mapping readsagainst a set of homologous sequences at one time but the problem is not fully resolved, particularly when dealingwith short reads. The issue addressed in our study is much more challenging; In addition to the parallel assignmentissue in the presence of short reads, RNA variants molecules, used for the library sequencing preparation step,undergo a specific experimental treatment SHAPE causing the formation of mutations at the level of structurallyunpaired nucleotides. Mutations due to SHAPE might lead to a miss-mapping i.e. a read could be derived from agiven RNA variant i and because of SHAPE mutations it becomes more appropriate to assign it to the variant jfrom which the read has the shortest base distance. In an ongoing work, we are trying to resolve the unprecedentedmapping question trough an Expectation Maximization (EM) algorithm where each RNA variant from the setof references would be characterized by a SHAPE mutational profile instead of being merely characterized by asequence of nucleotides. The EM algorithm aims to maximize the likelihood of a read to be derived from a specificRNA variant and to assess its contribution to build the RNA associated mutational profile

    Similar works

    Full text

    thumbnail-image

    Available Versions