13 research outputs found

    Illustration of the loops within a 3-dimensional array.

    No full text
    <p>We sketch the structure of a 3-dimensional data-array <i>D</i>, with <i>J</i> rows, <i>K</i> columns and <i>P</i> ‘layers’. Each entry <i>D</i><sub><i>j</i>,<i>k</i>,<i>l</i></sub> will lie in the cube shown. The loops within <i>D</i> can be divided into 3-categories: (a) iso-layer loops that stretch across 2 rows and 2 columns, (b) iso-column loops that stretch across 2 rows and 2 layers, and (c) iso-row loops that stretch across 2 columns and 2 layers. The row-score [<i>Z</i><sub>ROW</sub>]<sub><i>j</i></sub> aggregates all the iso-column and iso-layer loops associated with row-<i>j</i>. The column-score [<i>Z</i><sub>COL</sub>]<sub><i>k</i></sub> aggregates all the iso-row and iso-layer loops associated with column-<i>k</i>. The layer-score [<i>Z</i><sub>LYR</sub>]<sub><i>l</i></sub> aggregates all the iso-row and iso-column loops associated with layer-<i>l</i>.</p

    Illustration of the GSE48091 gene-expression data-set used in Example-A (see main text).

    No full text
    <p>Each row corresponds to a patient, and each column to a ‘gene’ (i.e., gene-expression measurement): the color of each pixel codes for the intensity of a particular measurement of a particular patient (see colorbar to the bottom).<i>M</i><sub><i>D</i></sub> = 340 of these patients are cases, the other <i>M</i><sub><i>X</i></sub> = 166 are controls; we group the former into the case-matrix ‘<i>D</i>’, and the latter into the control-matrix ‘<i>X</i>’.</p

    Contrasting a bicluster with controls.

    No full text
    <p>This shows the bicluster of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g005" target="_blank">Fig 5B</a> on top, and the rest of the controls on the bottom. The control-patients have been rearranged in order of their correlation with the co-expression pattern of the bicluster. Even though a few of the controls (i.e,. ∼ 3/166) exhibit a coexpression pattern comparable to that expressed by the bicluster, the vast majority do not.</p

    Illustration of bicluster found within gene-expression data-set.

    No full text
    <p>Both panels illustrate the same submatrix (i.e., bicluster) drawn from the full case-matrix shown at the top of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. This bicluster was found using our control-corrected biclustering algorithm (described in section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s005" target="_blank">S1 Text</a>). In Panel-A we represent this bicluster using the row- and column-ordering given by the output of our algorithm. This ordering has certain advantages (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>), but does not make the co-expression pattern particularly clear to the eye. Thus, to show this co-expression more clearly, we present the bicluster again in Panel-B, except this time with the rows and columns rearranged so that the coefficients of the first principal-component-vector change monotonically. As can be seen, there is a striking pattern of correlation across the 793 genes for the 45 cases shown.</p

    Illustration of bicluster found within genome-wide-association-study dataset.

    No full text
    <p>In this figure we illustrate the genome-wide association-study (i.e., GWAS) data-set discussed in Example-B (see main text). This data-set involves 16577 patients, each genotyped across 276768 genetic base-pair-locations (i.e., alleles). Many of these patients have a particular psychological disorder, while the remainder do not. We use this phenotype to separate the patients into <i>M</i><sub><i>D</i></sub> = 9752 cases and <i>M</i><sub><i>X</i></sub> = 6825 controls. The size of this GWAS data-set is indicated in the background of this picture, and dwarfs the size of the gene-expression data-set used in Example-A (inset for comparison). At the top of the foreground we illustrate an <i>m</i> = 115 by <i>n</i> = 706 submatrix found within the case-patients. This submatrix is a low-rank bicluster, and the alleles are strongly correlated across these particular case-patients. The order of the patients and alleles within this submatrix has been chosen to emphasize this correlation. For comparison, we pull out a few other randomly-chosen case-patients and control-patients, and present their associated submatrices (defined using the same 706 alleles) further down.</p

    Performance of loop-scores vs spectral-biclustering applied to the planted-bicluster problem.

    No full text
    <p>For each instantiation of the planted-bicluster problem we choose an <i>M</i>, <i>m</i>, <i>ε</i> and <i>l</i>; we use these parameters to generate a random <i>M</i> × <i>M</i> matrix <i>D</i> and embedded <i>m</i> × <i>m</i> rank-<i>l</i> submatrix <i>B</i> with spectral noise <i>ε</i>. For each instantiation, our algorithm produces a list of row- and column-indices of <i>D</i> in the order in which they are eliminated; those rows and columns retained the longest are expected to be members of <i>B</i>. To assess the success of our algorithm we calculate the auc <i>A</i><sub><i>R</i></sub> (i.e., area under the receiver operator characteristic curve) associated with the row-indices of <i>B</i> with respect to the output list from our algorithm. The value <i>A</i><sub><i>R</i></sub> is equal to the probability that: given a randomly chosen row from <i>B</i> as well as a randomly chosen row from outside of <i>B</i>, our algorithm eliminates the latter before the former (i.e., the latter is lower on our list than the former); We calculate the auc <i>A</i><sub><i>C</i></sub> for the columns similarly. Finally, we use <i>A</i> = (<i>A</i><sub><i>R</i></sub> + <i>A</i><sub><i>C</i></sub>)/2 as a metric of success; values of <i>A</i> near 1 mean that the rows and columns of <i>B</i> were filtered to the top by our algorithm, whereas values of <i>A</i> near 0.5 mean that our algorithm failed to detect <i>B</i>. In the top of Panel-A we show the trial-averaged auc <i>A</i> for our loop-counting method as a function of and log<sub><i>M</i></sub> (<i>m</i>). Results for <i>l</i> = 1 are shown on the left; <i>l</i> = 2 is shown on the right. Each subplot takes the form of a heatmap, with each pixel showing the value of <i>A</i> for a given value of and log<sub><i>M</i></sub> (<i>m</i>) (averaged over at least 128 trials). The different subplots correspond to different values for <i>M</i>. Note that our loop-counting algorithm is generally successful when and . In the bottom of Panel-A we show the analogous auc <i>A</i> for a simple implementation of the spectral method (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>). In Panel-B we show the difference in trial-averaged <i>A</i> between these two methods (see colorbar for scale). Note that when <i>l</i> ≥ 2 or the noise is small, our loop-score generally has a higher rate of success than the spectral method. On the other hand, there do exist parameters when <i>l</i> = 1 and where the spectral method has a higher rate of success. In each panel the thin grey line shows the detection-boundary for our loop-counting method (calculated using ).</p

    A scatterplot of the data shown in Fig 10.

    No full text
    <p>Each row-trace shown on the left in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g010" target="_blank">Fig 10</a> is plotted as a single point in 2-dimensional space; the horizontal-axis corresponds to the maximum row-trace and the vertical-axis corresponds to the average row-trace (taken across the iterations). The original-data is indicated with a ‘⊗’, and each of the random shuffles with a colored ‘•’. The <i>p</i>-value for any point in this plane is equal to the fraction of label-shuffled-traces that have either an <i>x</i>-position larger than <i>x</i><sub><i>w</i></sub> or a <i>y</i>-position larger than <i>y</i><sub><i>w</i></sub>, where <i>x</i><sub><i>w</i></sub> and <i>y</i><sub><i>w</i></sub> are the <i>x</i>- and <i>y</i>-percentiles associated with the most extreme coordinate of (details given in section ) of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>. Each random shuffle is colored by its p-value determined by the label-shuffled-distribution. By comparing the original-trace with the shuffled-distribution we can read off a p-value for the original-data of ≲ 0.008.</p

    Continuous–covariate-distribution for the bicluster shown in Example-B.

    No full text
    <p>As mentioned in the introduction, our algorithm proceeds iteratively, removing rows and columns from the case-matrix until there are none left. One of our goals is to ensure that, during this process, our algorithm focuses on biclusters which involve case-patients that are relatively well balanced in covariate-space. On the left we show a scatterplot illustrating the 2-dimensional distribution of covariate-components across the remaining <i>m</i> = 115 case-patients within the bicluster shown in Example-B (i.e., <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g007" target="_blank">Fig 7</a>). The horizontal and vertical lines in each subplot indicate the medians of the components of the covariate-distribution. On the right we show the same data again, except in contour form (note colorbar). The continuous-covariates remain relatively well-distributed even though relatively few case-patients are left (compare with <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g009" target="_blank">Fig 9</a>).</p

    Row-traces for the bicluster shown in Example-A.

    No full text
    <p>This bicluster was found by running our algorithm on the data shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. Because we corrected for controls, we compare our original-data to the distribution we obtain under the null-hypothesis H0 (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#sec011" target="_blank">Methods</a>). On the left we show the row-trace as a function of iteration for the original-data (red) as well as each of the 256 random shuffles (blue). On the right we replot this same trace data, showing the 5th, 50th and 95th percentile (across iterations) of the H0 distribution. Because we are not correcting for any covariates, the column-traces are identical to the row-traces.</p
    corecore