20 research outputs found
Concordance between lanes.
<p>Distributions of genotype concordance rates from same- and different-sample comparisons are non-overlapping. The box plot in (A) shows the distributions of concordance rates when using all callable positions for all combinations of pairs of the three samples being analyzed. The x-axis denotes each pair being compared (A, B, and C, refer to the sample IDs in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0023683#pone-0023683-t001" target="_blank">Table 1</a>), and the y-axis represents the distribution of concordance rates for all pair-wise combinations of lanes representing the specific pair of samples on the x-axis. It is likely that the detected differences from same-sample comparisons (B–B, C–C, and A–A) arise solely from sequencing and genotyping error. The box plot in (B) is similar to (A), except that here only variant (nonreference) positions are considered. The symmetrical heat map in (C) summarizes the data from panel (A); the blue boxes represent low concordance rates and correspond to different-sample comparisons, while the yellow boxes along the diagonal represent high concordance rates and correspond to same-sample comparisons. Note that comparisons between samples B and C (gray boxes) are slightly more similar to each other than the other different-sample comparisons, but still sufficiently distinct from same-sample comparisons. This is expected given the known partial consanguinity between these individuals.</p
Overview of approach.
<p>Several lanes of HiSeq 2000 data are typically combined together for a comprehensive genome analysis, giving a high depth of coverage (A), and the ability to accurately call genotypes in the majority of the genome. In (B), two individual lanes of HiSeq 2000 data are depicted, with a lower average depth of coverage. By chance, some regions of the genome have enough data to be genotyped in both lanes (shaded gray).</p
Summary of data used in these analyses.
<p>Number of reads reflects the number of aligning reads after removing duplicate read pairs and filtering for low quality alignments (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0023683#s4" target="_blank">Methods</a>). Gender was determined by looking at coverage of reads in specific representative regions of the X and Y chromosomes (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0023683#s4" target="_blank">Methods</a>). Number of genotypes called is from the autosomes only, which is what was used for downstream comparisons.</p
Effect of data quantity on concordance rates.
<p>The total number of reads used in the analysis affects different-sample comparisons, but not same-sample comparisons. In (A), lane 7 of sample ID B was kept constant at 140 million reads (B7), and the amount of data for the other sample [either lane 8 of B (B8) or lane 7 of C (C7)] was varied between 40 million and 140 million reads (x-axis) in 20 million read increments. The y-axis represents the concordance rate between variant (nonreference) genotypes called between the two different datasets. Note that for the same-sample comparison (red line), varying the number of reads used in the analysis does not substantially alter the concordance rate. However, this is not the case for different-sample comparisons (blue line), where the concordance rate becomes more different as more reads are used. In (B), a similar trend is observed when the reads in both samples are incremented simultaneously. Solid lines represent a LOESS smoothed fit to the data points.</p
Comparison of species-specific DHS to independently derived cells.
<p>Human DHS gains show a high level of overlap to DHS regions identified in (a) three independently analyzed human fibroblast cell lines and (b) 5 independently analyzed human LCL samples, compared to human DHS losses. Common DHS are also similarly detected.</p
Comparison of DHS sites and DGE-seq data across species.
<p>(a) Analysis pipeline. DNase-sequences from each species were aligned to the native genome and lifted over to the human genome for analysis. Regions are filtered at various steps of the analysis to remove alignment and orthology artifacts (Materials and Materials). Correlation plots of DNase-seq signals (b) and DGE-seq signals (c) expression data show that both chromatin and expression data from human (Hu), chimpanzee (Ch), and macaque (Ma) are more highly correlated between biological replicates from the same tissue within a single species. Additionally, the same cell type from different species is more similar than different cell types from the same species.</p
Identification of species-specific differences in DHS sites.
<p>Species-specific DHS sites were identified by edgeR (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002789#s4" target="_blank">Materials and Methods</a>). Boxplots show the distribution of number of reads per sample in 300 bp windows. For human DHS gains (a), the 3 human samples are all significantly more open than the other 2 species. Likewise, human DHS losses (b) show lower signal in human compared to both chimpanzee and macaque. A representative sampling of distributions from all DHS is shown in (c), as well as Common DHS sites (d) found in all three species that are matched for signal intensity compared to human DHS gains and human DHS losses. (e) Distribution of species-specific DHS Gains and DHS Losses relative to promoters, introns, 3′ UTR, and intergenic regions. (f) Representative screen shots of human-specific DHS Gains and Losses compared to a Common region.</p
Comparison of human DHS site gains and losses to DNase-seq data from other human cell types.
<p>The log of the DNase-seq signal intensity value, defined as the maximum parzen score (output of F-seq) for each of the coordinates that are represented along the x-axis, are represented as a heatmap in these figures. The color red represents a higher score, and thus a relatively higher DNase-seq signal, and the color blue represents a lower score. (a) 836 DHS sites were identified as differentially open (human DHS gain) in human fibroblasts compared to chimpanzee/macaque fibroblasts. These regions from Human Fibroblasts (Hu Fibro 1–3) were compared to DNase-seq data generated from 27 other human cell types (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002789#pgen.1002789.s019" target="_blank">Table S3</a>). Additional human skin fibroblast samples (listed in black) are highly similar, while some non-fibroblast cell types show less but substantial overlap and the remaining cell types show much less overlap. Only a small fraction of DHS sites were active in all 27 cell lines (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002789#pgen.1002789.s006" target="_blank">Figure S5</a>). Sites with evidence for positive selection are indicated in the horizontal bar above the heatmap. The distribution appears roughly uniform. (b) 286 DHS sites identified as differentially closed (human DHS loss) compared to chimp and macaque fibroblasts. (c) DNase-seq signal values for Common regions representing DHS sites in all three species. More than 50% of Common regions are also DHS sites in other human tissues. (d, e, f) DNase-seq values for same regions as (a, b, c), but DNase data is from orthologous region from chimpanzee and macaque fibroblasts.</p
Functional mutations associated with DHS gains and losses.
<p>(a–e) Scatterplots showing the enrichment of AP1 motif matches in species with increased hypersensitivity. Each “x” represents a single DHS site. (a–c) positive values on each axis indicate better motif matches on the human branch. For these regions, points in the upper-right quadrant are regions where the AP1 motif scores better in human than either chimp or macaque, where the lower left represent AP1 motif scores worse in human. The number of DHS sites in these quadrants are indicated. (d–e) For chimp gain and loss regions, positive values for each axis indicate a better motif match in the chimp branch. (f) The AP1 motif from JASPAR and an example alignment of a representative human gain region representing a point along the diagonal in the upper-right quadrant in panel a. (g) Boxplots summarizing the results from AP1 and three other motifs. The boxplots show the distribution of (combined) log-ratios (relative to the appropriate species). <i>P</i> values for differences relative to common regions are significant (asterisk) in all 4 comparisons: human DHS gains, <i>P</i><10<sup>−31</sup>; human DHS losses <i>P</i><10<sup>−3</sup>; chimp DHS gains, <i>P</i><10<sup>−13</sup>; chimp DHS losses, <i>P</i><10<sup>−8</sup> (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002789#s4" target="_blank">Materials and Methods</a>). In AP1, the significant trends illustrate the same principal observed in panels a–e. Most other transcription factors have plots that show no pattern in motif score among species, such as SP1 and SOX10 (Supplemental data file 3 in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002789#pgen.1002789.s001" target="_blank">Dataset S1</a>). ZEB1, a transcriptional repressor, displays an inverse relationship with hypersensitivity.</p
Mutational Signatures of De-Differentiation in Functional Non-Coding Regions of Melanoma Genomes
<div><p>Much emphasis has been placed on the identification, functional characterization, and therapeutic potential of somatic variants in tumor genomes. However, the majority of somatic variants lie outside coding regions and their role in cancer progression remains to be determined. In order to establish a system to test the functional importance of non-coding somatic variants in cancer, we created a low-passage cell culture of a metastatic melanoma tumor sample. As a foundation for interpreting functional assays, we performed whole-genome sequencing and analysis of this cell culture, the metastatic tumor from which it was derived, and the patient-matched normal genomes. When comparing somatic mutations identified in the cell culture and tissue genomes, we observe concordance at the majority of single nucleotide variants, whereas copy number changes are more variable. To understand the functional impact of non-coding somatic variation, we leveraged functional data generated by the ENCODE Project Consortium. We analyzed regulatory regions derived from multiple different cell types and found that melanocyte-specific regions are among the most depleted for somatic mutation accumulation. Significant depletion in other cell types suggests the metastatic melanoma cells de-differentiated to a more basal regulatory state. Experimental identification of genome-wide regulatory sites in two different melanoma samples supports this observation. Together, these results show that mutation accumulation in metastatic melanoma is nonrandom across the genome and that a de-differentiated regulatory architecture is common among different samples. Our findings enable identification of the underlying genetic components of melanoma and define the differences between a tissue-derived tumor sample and the cell culture created from it. Such information helps establish a broader mechanistic understanding of the linkage between non-coding genomic variations and the cellular evolution of cancer.</p> </div