Search CORE

13 research outputs found

Illustration of the loops within a 3-dimensional array.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

We sketch the structure of a 3-dimensional data-array D, with J rows, K columns and P ‘layers’. Each entry Dj,k,l will lie in the cube shown. The loops within D can be divided into 3-categories: (a) iso-layer loops that stretch across 2 rows and 2 columns, (b) iso-column loops that stretch across 2 rows and 2 layers, and (c) iso-row loops that stretch across 2 columns and 2 layers. The row-score [ZROW]j aggregates all the iso-column and iso-layer loops associated with row-j. The column-score [ZCOL]k aggregates all the iso-row and iso-layer loops associated with column-k. The layer-score [ZLYR]l aggregates all the iso-row and iso-column loops associated with layer-l.</p

FigShare

Illustration of the GSE48091 gene-expression data-set used in Example-A (see main text).

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Each row corresponds to a patient, and each column to a ‘gene’ (i.e., gene-expression measurement): the color of each pixel codes for the intensity of a particular measurement of a particular patient (see colorbar to the bottom).MD = 340 of these patients are cases, the other MX = 166 are controls; we group the former into the case-matrix ‘D’, and the latter into the control-matrix ‘X’.</p

FigShare

Contrasting a bicluster with controls.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

This shows the bicluster of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g005" target="_blank">Fig 5B</a> on top, and the rest of the controls on the bottom. The control-patients have been rearranged in order of their correlation with the co-expression pattern of the bicluster. Even though a few of the controls (i.e,. ∼ 3/166) exhibit a coexpression pattern comparable to that expressed by the bicluster, the vast majority do not.</p

FigShare

Illustration of bicluster found within gene-expression data-set.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Both panels illustrate the same submatrix (i.e., bicluster) drawn from the full case-matrix shown at the top of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. This bicluster was found using our control-corrected biclustering algorithm (described in section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s005" target="_blank">S1 Text</a>). In Panel-A we represent this bicluster using the row- and column-ordering given by the output of our algorithm. This ordering has certain advantages (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>), but does not make the co-expression pattern particularly clear to the eye. Thus, to show this co-expression more clearly, we present the bicluster again in Panel-B, except this time with the rows and columns rearranged so that the coefficients of the first principal-component-vector change monotonically. As can be seen, there is a striking pattern of correlation across the 793 genes for the 45 cases shown.</p

FigShare

Illustration of bicluster found within genome-wide-association-study dataset.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

In this figure we illustrate the genome-wide association-study (i.e., GWAS) data-set discussed in Example-B (see main text). This data-set involves 16577 patients, each genotyped across 276768 genetic base-pair-locations (i.e., alleles). Many of these patients have a particular psychological disorder, while the remainder do not. We use this phenotype to separate the patients into MD = 9752 cases and MX = 6825 controls. The size of this GWAS data-set is indicated in the background of this picture, and dwarfs the size of the gene-expression data-set used in Example-A (inset for comparison). At the top of the foreground we illustrate an m = 115 by n = 706 submatrix found within the case-patients. This submatrix is a low-rank bicluster, and the alleles are strongly correlated across these particular case-patients. The order of the patients and alleles within this submatrix has been chosen to emphasize this correlation. For comparison, we pull out a few other randomly-chosen case-patients and control-patients, and present their associated submatrices (defined using the same 706 alleles) further down.</p

FigShare

Performance of loop-scores vs spectral-biclustering applied to the planted-bicluster problem.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

For each instantiation of the planted-bicluster problem we choose an M, m, ε and l; we use these parameters to generate a random M × M matrix D and embedded m × m rank-l submatrix B with spectral noise ε. For each instantiation, our algorithm produces a list of row- and column-indices of D in the order in which they are eliminated; those rows and columns retained the longest are expected to be members of B. To assess the success of our algorithm we calculate the auc AR (i.e., area under the receiver operator characteristic curve) associated with the row-indices of B with respect to the output list from our algorithm. The value AR is equal to the probability that: given a randomly chosen row from B as well as a randomly chosen row from outside of B, our algorithm eliminates the latter before the former (i.e., the latter is lower on our list than the former); We calculate the auc AC for the columns similarly. Finally, we use A = (AR + AC)/2 as a metric of success; values of A near 1 mean that the rows and columns of B were filtered to the top by our algorithm, whereas values of A near 0.5 mean that our algorithm failed to detect B. In the top of Panel-A we show the trial-averaged auc A for our loop-counting method as a function of and logM (m). Results for l = 1 are shown on the left; l = 2 is shown on the right. Each subplot takes the form of a heatmap, with each pixel showing the value of A for a given value of and logM (m) (averaged over at least 128 trials). The different subplots correspond to different values for M. Note that our loop-counting algorithm is generally successful when and . In the bottom of Panel-A we show the analogous auc A for a simple implementation of the spectral method (see section of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>). In Panel-B we show the difference in trial-averaged A between these two methods (see colorbar for scale). Note that when l ≥ 2 or the noise is small, our loop-score generally has a higher rate of success than the spectral method. On the other hand, there do exist parameters when l = 1 and where the spectral method has a higher rate of success. In each panel the thin grey line shows the detection-boundary for our loop-counting method (calculated using ).</p

FigShare

A scatterplot of the data shown in Fig 10.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

Each row-trace shown on the left in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g010" target="_blank">Fig 10</a> is plotted as a single point in 2-dimensional space; the horizontal-axis corresponds to the maximum row-trace and the vertical-axis corresponds to the average row-trace (taken across the iterations). The original-data is indicated with a ‘⊗’, and each of the random shuffles with a colored ‘•’. The p-value for any point in this plane is equal to the fraction of label-shuffled-traces that have either an x-position larger than xw or a y-position larger than yw, where xw and yw are the x- and y-percentiles associated with the most extreme coordinate of (details given in section ) of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.s006" target="_blank">S2 Text</a>. Each random shuffle is colored by its p-value determined by the label-shuffled-distribution. By comparing the original-trace with the shuffled-distribution we can read off a p-value for the original-data of ≲ 0.008.</p

FigShare

Continuous–covariate-distribution for the bicluster shown in Example-B.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

As mentioned in the introduction, our algorithm proceeds iteratively, removing rows and columns from the case-matrix until there are none left. One of our goals is to ensure that, during this process, our algorithm focuses on biclusters which involve case-patients that are relatively well balanced in covariate-space. On the left we show a scatterplot illustrating the 2-dimensional distribution of covariate-components across the remaining m = 115 case-patients within the bicluster shown in Example-B (i.e., <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g007" target="_blank">Fig 7</a>). The horizontal and vertical lines in each subplot indicate the medians of the components of the covariate-distribution. On the right we show the same data again, except in contour form (note colorbar). The continuous-covariates remain relatively well-distributed even though relatively few case-patients are left (compare with <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g009" target="_blank">Fig 9</a>).</p

FigShare

Boletín de Segovia: Número 128 - 1861 octubre 23

Author: Amanda Curtis (3437192)
Amy Webb (791450)
Audrey Papp (252886)
Caryn Lerman (264866)
Daqing Wang (406452)
Deborah Mash (3437198)
Erica Graziosa (3437201)
Grzegorz Rempala (212741)
John Kelsoe (3437189)
Leslie Newman (3437204)
Maciej Pietrzak (212740)
Michal Seweryn (791451)
Rachel Tyndale (3437195)
Samuel Handelman (3437207)
Wolfgang Sadee (179657)
Publication venue
Publication date: 23/10/1861
Field of study

Copia digital. Madrid : Ministerio de Cultura. Subdirección General de Coordinación Bibliotecaria, 200

Biblioteca Virtual de Prensa Histórica (Virtual Library of Historical Newspapers)

FigShare

Row-traces for the bicluster shown in Example-A.

Author: Aaditya V. Rangan (305264)
Anders Jureus (1408867)
Arjun Krishnan (5216162)
Caroline C. McGrouther (5216153)
Eli Stahl (33862)
John Kelsoe (3437189)
Mikael Landen (5216159)
Nicholas Schork (332955)
Olga Troyanskaya (252137)
Preeti Raghavan (5216168)
Qian Zhu (191342)
Sarah Bergen (3496505)
Seda Bilaloglu (5216165)
Vicky Yao (5216156)
Publication venue
Publication date
Field of study

This bicluster was found by running our algorithm on the data shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#pcbi.1006105.g004" target="_blank">Fig 4</a>. Because we corrected for controls, we compare our original-data to the distribution we obtain under the null-hypothesis H0 (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006105#sec011" target="_blank">Methods</a>). On the left we show the row-trace as a function of iteration for the original-data (red) as well as each of the 256 random shuffles (blue). On the right we replot this same trace data, showing the 5th, 50th and 95th percentile (across iterations) of the H0 distribution. Because we are not correcting for any covariates, the column-traces are identical to the row-traces.</p

FigShare