Search CORE

12 research outputs found

Illustration of the GSE48091 gene-expression data-set used in Example-A (see main text).

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Each row corresponds to a patient, and each column to a ‘gene’ (i.e., gene-expression measurement): the color of each pixel codes for the intensity of a particular measurement of a particular patient (see colorbar to the bottom).MD = 340 of these patients are cases, the other MX = 166 are controls; we group the former into the case-matrix ‘D’, and the latter into the control-matrix ‘X’.</p

FigShare

Illustration of the loops within a 3-dimensional array.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

We sketch the structure of a 3-dimensional data-array D, with J rows, K columns and P ‘layers’. Each entry Dj,k,l will lie in the cube shown. The loops within D can be divided into 3-categories: (a) iso-layer loops that stretch across 2 rows and 2 columns, (b) iso-column loops that stretch across 2 rows and 2 layers, and (c) iso-row loops that stretch across 2 columns and 2 layers. The row-score [ZROW]j aggregates all the iso-column and iso-layer loops associated with row-j. The column-score [ZCOL]k aggregates all the iso-row and iso-layer loops associated with column-k. The layer-score [ZLYR]l aggregates all the iso-row and iso-column loops associated with layer-l.</p

FigShare

Contrasting a bicluster with controls.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

This shows the bicluster of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g005" target="_blank">Fig 5B</a> on top, and the rest of the controls on the bottom. The control-patients have been rearranged in order of their correlation with the co-expression pattern of the bicluster. Even though a few of the controls (i.e,. ∼ 3/166) exhibit a coexpression pattern comparable to that expressed by the bicluster, the vast majority do not.</p

FigShare

Illustration of bicluster found within gene-expression data-set.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Both panels illustrate the same submatrix (i.e., bicluster) drawn from the full case-matrix shown at the top of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. This bicluster was found using our control-corrected biclustering algorithm (described in section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s005" target="_blank">S1 Text</a>). In Panel-A we represent this bicluster using the row- and column-ordering given by the output of our algorithm. This ordering has certain advantages (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>), but does not make the co-expression pattern particularly clear to the eye. Thus, to show this co-expression more clearly, we present the bicluster again in Panel-B, except this time with the rows and columns rearranged so that the coefficients of the first principal-component-vector change monotonically. As can be seen, there is a striking pattern of correlation across the 793 genes for the 45 cases shown.</p

FigShare

Row-traces for the bicluster shown in Example-A.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

This bicluster was found by running our algorithm on the data shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. Because we corrected for controls, we compare our original-data to the distribution we obtain under the null-hypothesis H0 (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#sec011" target="_blank">Methods</a>). On the left we show the row-trace as a function of iteration for the original-data (red) as well as each of the 256 random shuffles (blue). On the right we replot this same trace data, showing the 5th, 50th and 95th percentile (across iterations) of the H0 distribution. Because we are not correcting for any covariates, the column-traces are identical to the row-traces.</p

FigShare

A highly idealized cartoon of different kinds of biclusters.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

In each panel we show a heat-map of an M × N matrix ‘D’, which contains a large embedded bicluster (highlighted in pink) with a special structure. In this cartoon, light and dark pixels correspond to high and low values for the corresponding matrix-entry. Many approaches to biclustering search for structures containing mostly ‘large’ or ‘small’ values—as shown in Panels A and B. Such a bicluster can be thought of as delineating a subset of columns which are ‘differentially-expressed’ with respect to the remaining rows of D. Our algorithm generalizes this notion, searching for biclusters that are ‘low-rank’. Examples of low-rank biclusters include those shown in Panels A and B, as well as ‘rank-1’ biclusters which can exhibit co-expression without necessarily exhibiting differential-expression (see Panel-C and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g005" target="_blank">Fig 5</a> later on). Also encompassed are ‘rank-2’ and higher biclusters which exhibit higher-order correlations that are not necessarily obvious to the eye (see Panel-D and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g007" target="_blank">Fig 7</a> later on). Note that, while the biclusters shown in this cartoon are very large and essentially noiseless, our algorithm can readily discover biclusters that are much smaller and noisier (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s005" target="_blank">S1 Text</a>).</p

FigShare

A scatterplot of the data shown in Fig 10.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Each row-trace shown on the left in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g010" target="_blank">Fig 10</a> is plotted as a single point in 2-dimensional space; the horizontal-axis corresponds to the maximum row-trace and the vertical-axis corresponds to the average row-trace (taken across the iterations). The original-data is indicated with a ‘⊗’, and each of the random shuffles with a colored ‘•’. The p-value for any point in this plane is equal to the fraction of label-shuffled-traces that have either an x-position larger than xw or a y-position larger than yw, where xw and yw are the x- and y-percentiles associated with the most extreme coordinate of (details given in section ) of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>. Each random shuffle is colored by its p-value determined by the label-shuffled-distribution. By comparing the original-trace with the shuffled-distribution we can read off a p-value for the original-data of ≲ 0.008.</p

FigShare

Continuous–covariate-distribution for the bicluster shown in Example-B.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

As mentioned in the introduction, our algorithm proceeds iteratively, removing rows and columns from the case-matrix until there are none left. One of our goals is to ensure that, during this process, our algorithm focuses on biclusters which involve case-patients that are relatively well balanced in covariate-space. On the left we show a scatterplot illustrating the 2-dimensional distribution of covariate-components across the remaining m = 115 case-patients within the bicluster shown in Example-B (i.e., <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g007" target="_blank">Fig 7</a>). The horizontal and vertical lines in each subplot indicate the medians of the components of the covariate-distribution. On the right we show the same data again, except in contour form (note colorbar). The continuous-covariates remain relatively well-distributed even though relatively few case-patients are left (compare with <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g009" target="_blank">Fig 9</a>).</p

FigShare

Illustration of bicluster found within genome-wide-association-study dataset.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

In this figure we illustrate the genome-wide association-study (i.e., GWAS) data-set discussed in Example-B (see main text). This data-set involves 16577 patients, each genotyped across 276768 genetic base-pair-locations (i.e., alleles). Many of these patients have a particular psychological disorder, while the remainder do not. We use this phenotype to separate the patients into MD = 9752 cases and MX = 6825 controls. The size of this GWAS data-set is indicated in the background of this picture, and dwarfs the size of the gene-expression data-set used in Example-A (inset for comparison). At the top of the foreground we illustrate an m = 115 by n = 706 submatrix found within the case-patients. This submatrix is a low-rank bicluster, and the alleles are strongly correlated across these particular case-patients. The order of the patients and alleles within this submatrix has been chosen to emphasize this correlation. For comparison, we pull out a few other randomly-chosen case-patients and control-patients, and present their associated submatrices (defined using the same 706 alleles) further down.</p

FigShare

Illustration of the algorithm operating on a case-matrix alone (i.e., D only).

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

In Panel-A we show a large M × N binarized matrix D (black and white pixels correspond to values of ±1, respectively). In the upper left corner of D we’ve inserted a large rank-1 bicluster B (shaded in pink). Our algorithm considers all 2 × 2 submatrices (i.e., ‘loops’) within D. Several such loops are highlighted via the blue rectangles (the corners of each rectangle pick out a 2 × 2 submatrix). Generally speaking, loops are equally likely to be rank-1 or rank-2. Some loops, such as the loop shown in red, are entirely contained within B. These loops are more likely to be rank-1 than rank-2. In Panel-B we show some examples of rank-2 and rank-1 loops. Given a loop with row-indices j, j′ and column-indices k, k′, the rank of the loop is determined by the sign of . Our algorithm accumulates a ‘loop-score’ for each row j and each column k. In its simplest form, the loop-score for a particular row j is given by . Analogously, the loop-score for a column k is given by . In Panel-C we show the distribution of loop-scores we might expect from the rows or columns within D. The blue-curve corresponds to the distribution of scores expected from the rows/cols of D that are not in B, whereas the red-curve corresponds to the distribution of scores expected from the rows/cols of B. In Panel-D we show the distribution of loop-scores we might expect by pooling all rows or columns of D. The rows or columns that correspond to the lowest scores are not likely to be part of B.</p

FigShare