6 research outputs found

    Biological interpretation of the chromatin models based on feature importance.

    No full text
    <p>For each macro-category (on the far left), each dataset considered is indicated (stage, structure and reference publication are specified), followed by three distinct plots showing (left to right): (1) a box plot overlaid to a violin plot showing the distribution of the coefficients assigned to each particular feature by LASSO; (2) the selection probability as estimated by <i>Bootstrap LASSO</i> (darker shades of green indicating higher probability); (3) the feature importance estimated as mean decrease in accuracy by RF (separately for the positive and the negative classes, indicated by red and light blue bars, respectively).</p

    Limb-Enhancer Genie: An accessible resource of accurate enhancer predictions in the developing limb

    No full text
    <div><p>Epigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers <i>in vivo</i>. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological interpretability. Here we present the <u>L</u>imb-<u>E</u>nhancer <u>G</u>enie (LEG), a collection of highly accurate, genome-wide predictions of enhancers in the developing limb, available through a user-friendly online interface. We predict limb enhancers using a combination of >50 published limb-specific datasets and clusters of evolutionarily conserved transcription factor binding sites, taking advantage of the patterns observed at previously <i>in vivo</i> validated elements. By combining different statistical models, our approach outperforms current state-of-the-art methods and provides interpretable measures of feature importance. Our results indicate that including a previously unappreciated score that quantifies tissue-specific nuclease accessibility significantly improves prediction performance. We demonstrate the utility of our approach through <i>in vivo</i> validation of newly predicted elements. Moreover, we describe general features that can guide the type of datasets to include when predicting tissue-specific enhancers genome-wide, while providing an accessible resource to the general biological community and facilitating the functional interpretation of genetic studies of limb malformations.</p></div

    LEG predicts <i>bona fide</i> limb-enhancers genome-wide.

    No full text
    <p>(<b>A</b>) Overall enrichment scores (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#sec004" target="_blank">Methods</a>) for the indicated functional terms based on the proximity of the newly predicted elements to the genes annotated within each category (see also <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.s007" target="_blank">S6 Fig</a>). (<b>B</b>) UCSC genome browser [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.ref025" target="_blank">25</a>] snapshots showing the landscape at two previously <i>in vivo</i> validated limb-enhancers that were not part of the training set but were identified in the top 10,000 predictions. The region on the left is the ZRS (ZPA Regulatory Sequence), a known regulatory element for <i>Shh</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.ref043" target="_blank">43</a>]; the one on the right is an intronic enhancer of <i>Tfap2a</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.ref044" target="_blank">44</a>]. (<b>C</b>) UCSC genome browser snapshot of the <i>Hand2</i> gene locus. The probability of being a limb-enhancer (Ridge model) along with the top 5,000 predictions from both the Ridge Regression (RR) and the Sum Of Ranks (SOR) combined models are shown. The four elements tested for activity in the developing limbs are highlighted in boxes (green for those showing activity in the limbs at E11.5, red if negative). <i>LacZ</i> reporter staining (blue) indicates enhancer activities in the fore- and hindlimb mesenchyme at E11.5. One representative whole mount picture is reported for each tested element. Pictures of a representative forelimb and hindlimb are provided for the validated enhancers. Reproducibility is indicated in brackets below each whole mount picture, along with the corresponding VISTA identifier. The ranks for both combined scores (RR, SOR) are also reported. Scale bar, 100 μm.</p

    Overview of the approach.

    No full text
    <p>(<b>A</b>) <i>In vivo</i> tested sequences from VISTA [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.ref004" target="_blank">4</a>] considered in this study. For both limb-enhancers (left) as well as sequences not active in the developing limbs (right) the overlap with H3K27ac peaks and/or DNase I hypersensitive sites is shown as pie charts. (<b>B</b>) Schematic of the different classes of chromatin and sequence features considered in this study. (<b>C</b>) Summary of the machine learning strategy. After calculation of the relevant chromatin and sequence features for all observations, the data was partitioned into ten equally sized bins, retaining the original ratio of positive to negative observations. Training was performed using 10-fold cross-validation (CV), separately for each model (LASSO, RF, SVM) and categories of features (chromatin, sequence). The performances of these models as well as their combinations were evaluated on the ten independent, non-overlapping test sets. Models were then trained using the entire set of observations, and genome-wide predictions were made available through a track hub (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#sec004" target="_blank">Methods</a>) for the UCSC genome browser [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005720#pcbi.1005720.ref025" target="_blank">25</a>] and through a user-friendly web interface at <a href="http://leg.lbl.gov/" target="_blank">http://leg.lbl.gov/</a>.</p

    Limb-specific chromatin features accurately predict limb-enhancers.

    No full text
    <p>(<b>A</b>) Box plots showing the AUROC estimated on the ten leave-one-out test sets, considering an increasingly larger set of chromatin features (left to right, outliers not shown). (<b>B</b>) Same as (A) but showing the AUPRC. (<b>C</b>) UCSC genome browser snapshots indicating two representative loci. Validated limb enhancers (bright red elements) showed different features than nearby regions that tested negative <i>in vivo</i> (blue). In particular, they displayed a higher DNase I enrichment (compare <i>DNase I</i> to <i>Headless embryo</i>). <i>De novo</i> limb-enhancers predicted based on combined models are also shown (dark red).</p
    corecore