Search CORE

88 research outputs found

Optimal Scaling of Digital Transcriptomes

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date: 06/11/2013
Field of study

<div>Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of “uniform” genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of “ubiquitous” genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a “core” of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers.</div

Directory of Open Access Journals

PubMed Central

FigShare

Pairwise comparison of expression levels.

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date
Field of study

We compared the levels of expression of 15,861 genes with nonzero expression levels in both liver and testes, expressed in terms of average coverage per base. Each point represents one gene. A) Data prior to normalization. Housekeeping genes are highlighted as green points and labeled. The blue and red diagonals represent the relative correction factors computed based on total counts or the NCS method, relative to no normalization (black). The magenta and orange curves depict the percentiles when considering all genes or genes with nonzero values, respectively. B) Values after correction by NCS. Points in black or red denote genes with positive weights, and that therefore guided the scaling. Points in red denote the 39 genes with weight >0.5.</p

FigShare

Comparison of performance of various normalization methods.

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date
Field of study

Each method is evaluated by the number of genes observed to be consistently expressed across samples (abscissa); different methods also yield different numbers of genes identified as specific to one sample. The numbers in the orange circles denote the number of housekeeping genes combined using the geNorm algorithm. The dashed arrows show one stochastic path of the ES from the data prior to normalization (white square, “None”) to the best approximation to the optimal solution (gray square, ES). Brown squares represent the results obtained via the TMM method, using each of the 16 samples as reference.</p

FigShare

Additional file 4: of Novel metrics for quantifying bacterial genome composition skews

Author: Christopher Lausted (4354)
Gustavo Glusman (5925)
Lena Joesch-Cohen (5505923)
Max Robinson (480473)
Neda Jabbari (673910)
Publication venue
Publication date
Field of study

Table S3. Presence and absence of genes associated with replication, recombination, and repair. Data were retrieved from the KEGG Orthology database [45] for homologous recombination (ko03440), mismatch repair (ko03430), DNA repair and recombination (ko3400), nucleotide excision repair (ko03420), and base excision repair pathways (ko03410). Asterisks denote skew outliers. (XLSX 9 kb

FigShare

IBS percentage in different relationships of simulated families.

Author: Chad Huff (209473)
Gustavo Glusman (5925)
Hong Li (20183)
Jared C. Roach (42648)
Juan Caballero (480472)
Publication venue
Publication date
Field of study

For this visualization, the sequencing error (SE) parameter was set to zero. (A) Distribution of P2 in an example sibling pair. Siblings have much of the genome that is easily detectable as IBD2, which GRAB detects through a large number of windows with a very high P2 statistic. (B) Number of identity windows (IWs) between pairs of individuals, decreasing with increased relationship degree. (C) Percentage of contiguous IWs. A contiguous IW is any IW adjacent to another IW. Unrelated individuals have fewer contiguous IWs than relatives. (D) Maximum length of a set of contiguous IWs. This length tends to be shorter in distant genetic relationships than close relationships. IT: identical twin. FS: full sibling. PO: parent offspring. UN: unrelated individuals. UD: unknown distance.</p

FigShare

Comparison of multiple relationship estimation methods on real WGS families.

Author: Chad Huff (209473)
Gustavo Glusman (5925)
Hong Li (20183)
Jared C. Roach (42648)
Juan Caballero (480472)
Publication venue
Publication date
Field of study

Values in the table are the percentage of correct predictions. Values in parentheses are the percentage of predictions within one degree of the true relationship.</p

FigShare

Density distribution of Spearman correlations of sample rankings for some normalization methods.

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date
Field of study

Density distribution of Spearman correlations of sample rankings for some normalization methods.</p

FigShare

Comparison between the scaling factors suggested by the different methods.

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date
Field of study

Lower left: the resulting scaling factors for the heart sample. Upper right: Pairwise correlations between the methods, for all samples. Red shades denote high correlation values (above 0.75), blue denotes low correlation (or anticorrelation). The column to the right indicates the number of uniform genes identified by the method. The Quantile Normalization method is not included in this analysis since it does not produce scaling factors.</p

FigShare

Conceptual taxonomy of scaling methods.

Author: Burak Kutlu (68203)
Gustavo Glusman (5925)
Juan Caballero (480472)
Leroy Hood (4322)
Max Robinson (480473)
Publication venue
Publication date
Field of study

Blue: published methods. Pink: variations on published methods. Red: novel methods. Dashed lines connect related methods.</p

FigShare

A simulated 26-member, 7-generation pedigree.

Author: Chad Huff (209473)
Gustavo Glusman (5925)
Hong Li (20183)
Jared C. Roach (42648)
Juan Caballero (480472)
Publication venue
Publication date
Field of study

Green symbols indicate founders that were sequenced by CGI, and purple ones indicate children whose genotyping were simulated. The topology of the pedigree was chosen to enable testing of diverse relationship estimations.</p

FigShare