Clustering strings with mutations using an expectation-maximization algorithm In the context of RNA structure prediction

Abstract

International audienceIn comparative analysis, an RNA structure (a set of base pairs and unpaired nucleotides) is predicted from a set of RNA variants (similar sequences) under the assumption of the conservation of the structure during evolution. The combination of RNA variants with Experimental data informing about the local (nucleotide) structure may lead to more accurate structure prediction. The experimental protocol consists of mutating nucleotides likely to be 'unpaired'. A simultaneous reading of RNA variants sequences that underwent the experimental mutation protocol lead to the following issue: How to cluster 'mutated' substrings of similar parent strings such that each substring is correctly assigned to its parent string? We developed an Expectation Maximization algorithm that uses Mutational profiles (mutation distributions) to assign the substrings to their strings of origin

    Similar works