Abstract

<p>For each instantiation of the planted-bicluster problem we choose an <i>M</i>, <i>m</i>, <i>ε</i> and <i>l</i>; we use these parameters to generate a random <i>M</i> × <i>M</i> matrix <i>D</i> and embedded <i>m</i> × <i>m</i> rank-<i>l</i> submatrix <i>B</i> with spectral noise <i>ε</i>. For each instantiation, our algorithm produces a list of row- and column-indices of <i>D</i> in the order in which they are eliminated; those rows and columns retained the longest are expected to be members of <i>B</i>. To assess the success of our algorithm we calculate the auc <i>A</i><sub><i>R</i></sub> (i.e., area under the receiver operator characteristic curve) associated with the row-indices of <i>B</i> with respect to the output list from our algorithm. The value <i>A</i><sub><i>R</i></sub> is equal to the probability that: given a randomly chosen row from <i>B</i> as well as a randomly chosen row from outside of <i>B</i>, our algorithm eliminates the latter before the former (i.e., the latter is lower on our list than the former); We calculate the auc <i>A</i><sub><i>C</i></sub> for the columns similarly. Finally, we use <i>A</i> = (<i>A</i><sub><i>R</i></sub> + <i>A</i><sub><i>C</i></sub>)/2 as a metric of success; values of <i>A</i> near 1 mean that the rows and columns of <i>B</i> were filtered to the top by our algorithm, whereas values of <i>A</i> near 0.5 mean that our algorithm failed to detect <i>B</i>. In the top of Panel-A we show the trial-averaged auc <i>A</i> for our loop-counting method as a function of and log<sub><i>M</i></sub> (<i>m</i>). Results for <i>l</i> = 1 are shown on the left; <i>l</i> = 2 is shown on the right. Each subplot takes the form of a heatmap, with each pixel showing the value of <i>A</i> for a given value of and log<sub><i>M</i></sub> (<i>m</i>) (averaged over at least 128 trials). The different subplots correspond to different values for <i>M</i>. Note that our loop-counting algorithm is generally successful when and . In the bottom of Panel-A we show the analogous auc <i>A</i> for a simple implementation of the spectral method (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>). In Panel-B we show the difference in trial-averaged <i>A</i> between these two methods (see colorbar for scale). Note that when <i>l</i> ≥ 2 or the noise is small, our loop-score generally has a higher rate of success than the spectral method. On the other hand, there do exist parameters when <i>l</i> = 1 and where the spectral method has a higher rate of success. In each panel the thin grey line shows the detection-boundary for our loop-counting method (calculated using ).</p

    Similar works

    Full text

    thumbnail-image

    Available Versions