23 research outputs found
A reanalysis of mouse ENCODE comparative gene expression data [v1; ref status: indexed, http://f1000r.es/5ez]
Recently, the Mouse ENCODE Consortium reported that comparative gene expression data from human and mouse tend to cluster more by species rather than by tissue. This observation was surprising, as it contradicted much of the comparative gene regulatory data collected previously, as well as the common notion that major developmental pathways are highly conserved across a wide range of species, in particular across mammals. Here we show that the Mouse ENCODE gene expression data were collected using a flawed study design, which confounded sequencing batch (namely, the assignment of samples to sequencing flowcells and lanes) with species. When we account for the batch effect, the corrected comparative gene expression data from human and mouse tend to cluster by tissue, not by species
Functional Characterization of Variations on Regulatory Motifs
Transcription factors (TFs) regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication i
Classification performance of combined 100 nt single-read predictions, as compared to the best performing paired-end configurations.
<p>We combined predictions made for different 100 nt fragments of the same sequence, by selecting the prediction with the highest confidence score at the genus level (or the lowest common level available). We evaluated the performance, at ranks genus and family (left and right panels, respectively), of combinations of fragments from the V3 and V4 regions (top and bottom panels, respectively) with fragments from each of the other regions examined, and compared it to the performance of the V3 and V4 100 nt paired-end configurations (pointed to by arrows). We used the results of leave-k-out tests classifying the LTP sequences to determine confidence score thresholds for a set of desired false prediction rate (FPR) values (x axis), so that the FPR would be at most the desired value. We then used these thresholds to calculate the classification coverage of sequences from environmental (uncultured) bacteria that corresponds to the desired FPR (y axis). <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s006" target="_blank">Figure S6</a> compares the performance of the combinations for the ranks order, class, and phylum.</p
Performance of different training sets in the classification of 100 nt reads from the V4 amplicon.
<p>Each panel compares the performance of the training sets (described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone-0053608-t001" target="_blank">Table 1</a>) for a different rank. We used the results of leave-k-out tests classifying the LTP sequences to determine confidence score thresholds for a set of desired false prediction rate (FPR) values (x axis), so that the FPR would be at most the desired value. We then used these thresholds to calculate the classification coverage of sequences from environmental (uncultured) bacteria that corresponds to the desired FPR (y axis).</p
Classification performance of different experimental designs.
<p>Each panel compares performance of different regions for a different combination of rank (genus or family) and sequencing strategy (100/120 nt single/paired-end reads). We used the results of leave-k-out tests classifying the LTP sequences to determine confidence score thresholds for a set of desired false prediction rate (FPR) values (x axis), so that the FPR would be at most the desired value (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s010" target="_blank">Tables S4</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s011" target="_blank">S5</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s012" target="_blank">S6</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s013" target="_blank">S7</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s014" target="_blank">S8</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s015" target="_blank">S9</a>, and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s016" target="_blank">S10</a>). We then used these thresholds to calculate the classification coverage of sequences from environmental (uncultured) bacteria that corresponds to the desired FPR (y axis). <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#pone.0053608.s005" target="_blank">Figure S5</a> compares the performance of different regions across the same sequencing configurations for the ranks order, class, and phylum.</p
Recommended experimental designs.
a<p>Primer would be used only for amplification, not for sequencing.</p>b<p>The lowest confidence value threshold (CT) that is consistent with an FPR of 5% (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0053608#s3" target="_blank">methods</a>).</p>c<p>Coverage (in percentage units) observed for the confidence threshold in environmental sequences.</p>d<p>Median number of predictions in the interval [CT.CT+4] was smaller than 10.</p>e<p>Results for 100 nt and 120 nt paired end configurations were practically identical for this region, as we encountered few V3 amplicons that were longer than 100 nt (all in the environmental sequences).</p>f<p>CT for these cases is lower than that of a higher level/s, and a sequence can thus be classified at the current level but not at the higher taxonomic levels. We find that the classification of such sequences is associated with a high error rate and our recommendation is to exclude them, and have adjusted coverage accordingly.</p
Recommended from our members
Seasonal Variation in Human Gut Microbiome Composition
The composition of the human gut microbiome is influenced by many environmental factors. Diet is thought to be one of the most important determinants, though we have limited understanding of the extent to which dietary fluctuations alter variation in the gut microbiome between individuals. In this study, we examined variation in gut microbiome composition between winter and summer over the course of one year in 60 members of a founder population, the Hutterites. Because of their communal lifestyle, Hutterite diets are similar across individuals and remarkably stable throughout the year, with the exception that fresh produce is primarily served during the summer and autumn months. Our data indicate that despite overall gut microbiome stability within individuals over time, there are consistent and significant population-wide shifts in microbiome composition across seasons. We found seasonal differences in both (i) the abundance of particular taxa (false discovery rate ii) overall gut microbiome diversity (by Shannon diversity; Pβ=β0.001). It is likely that the dietary fluctuations between seasons with respect to produce availability explain, at least in part, these differences in microbiome composition. For example, high levels of produce containing complex carbohydrates consumed during the summer months might explain increased abundance of Bacteroidetes, which contain complex carbohydrate digesters, and decreased levels of Actinobacteria, which have been negatively correlated to fiber content in food questionnaires. Our observations demonstrate the plastic nature of the human gut microbiome in response to variation in diet.</p