6 research outputs found

    Robust and predictive positional features that appear in at least two of the analysed groups.

    No full text
    <p>For each feature, its effect along sequences is shown in a heat map (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005734#pcbi.1005734.g003" target="_blank">Fig 3</a>), and summarised as a consensus effect (located above each of the heat maps) across several groups, chosen as the effect whose directionality and importance are confirmed by at least two groups. Horizontal axes show feature window position relative to the start AUG.</p

    Performance of trained predictors.

    No full text
    <p>(A) Cross-validation (CV) performance of models trained on all available native IRES sequences shown for different combinations of <i>k</i>-mer lengths, and <i>k</i>-mer count (solid lines) or presence (dashed lines) features (left), with the selected combination marked with a circle. Scatter plot of predicted and true IRES activities for the selected model (middle) coloured according to the local density (blue to red as low to high density). The Receiver Operating Characteristic (ROC) curve and the area under the curve (AUC) for the selected combination. (B) CV performance of models trained for different groups of sequences. Only results for groups with models achieving sufficiently high performance are shown. (C) Training and test performance of the feature and <i>k</i>-mer length combination selected for the group of all native IRESs evaluated using several metrics.</p

    Overview of the available data and our analysis approach.

    No full text
    <p>(A) Schematic representation of the bicistronic reporter construct used in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005734#pcbi.1005734.ref021" target="_blank">21</a>] with eGFP (green) expression used to measure IRES activity of variable sequences (gray), and constitutively expressed mRFP used to control for unique genomic integration. To capture context effects, in our analyses the assayed variable sequences (thick gray) were extended to include flanking regions (solid filling). (B) The available sequences can be divided into 7 groups based on their origin species and location within transcripts. Number of active sequences, i.e. sequences with IRES activity above background levels, and the total number of RNA sequences are shown for each class. (C) Sequences from each of the groups are represented as vectors of sequence <i>k</i>-mer features (UA—orange, AC—green), which are recorded globally and in windows (gray shading). From this large set of features, those unlikely to be predictive are removed based on their weak correlation with IRES activity. Surviving features are used to construct a reduced feature matrix. (D) The reduced feature matrix is used for Random Forest training. Each RF tree consists of decision nodes (coloured according to the variables selected by those nodes during training) and leaf nodes that predict IRES activity (coloured according to their prediction). RF trees are constructed by iteratively selecting for each node a variable and split that yield the highest reduction in weighted variance in the nodes children; normalised variance reduction is shown for every node as a number. (E) Trained RFs are used to make IRES activity predictions for feature vectors <i>x</i> of unseen sequences by following each tree to the leaf node corresponding to <i>x</i> (path and leaves marked in red), and accumulating leaf node predictions to obtain the overall RF prediction <i>f</i>(<i>x</i>). (F) To select features that are most predictive of IRES activity, variance reduction values from (D) are accumulated per tree and averaged across trees to obtain <i>feature importance</i>. Normalised importance is also calculated for use in model interpretation. (G) To understand the effect of a feature (e.g. the AC <i>k</i>-mer), for each of its possible values <i>v</i> the expected prediction is plotted (blue curve). The resulting curve allows for characterising <i>v</i> either as having a positive (increasing curve, blue), or a negative (decreasing curve, red) effect on IRES activity. Expected predictions are approximated as the average of predictions made for training samples with the corresponding feature vector components substituted by value <i>v</i>.</p

    Performance of trained predictors.

    No full text
    <p>(A) Cross-validation (CV) performance of models trained on all available native IRES sequences shown for different combinations of <i>k</i>-mer lengths, and <i>k</i>-mer count (solid lines) or presence (dashed lines) features (left), with the selected combination marked with a circle. Scatter plot of predicted and true IRES activities for the selected model (middle) coloured according to the local density (blue to red as low to high density). The Receiver Operating Characteristic (ROC) curve and the area under the curve (AUC) for the selected combination. (B) CV performance of models trained for different groups of sequences. Only results for groups with models achieving sufficiently high performance are shown. (C) Training and test performance of the feature and <i>k</i>-mer length combination selected for the group of all native IRESs evaluated using several metrics.</p

    Testing the effect of the number of C/U-rich elements on IRES activity using synthetic oligos.

    No full text
    <p>(A) The TEV IRES element was placed in all possible combinations of 1-8 sites in predefined positions on two background sequences (native and synthetic; coloured lines) to generate synthetic oligos (gray blocks and lines), which were measured using the biscistronic IRES activity reporter assay. (B and C) Oligos were binned into four groups according to the number of placed elements: (left) the fraction of oligos with positive IRES activity from the total designed oligos is shown for each bin; (right) box plots showing the expression levels of oligos with positive IRES activity in each bin. Results are shown for a synthetic background (B) and a native background from the human beta-globin gene (HBB) (C).</p

    Summary of the sequence features associated with IRES activity.

    No full text
    <p>(A) Illustration of the sequence features found by our models and their association with IRES activity: (left) k-mer sequence, (middle) the number of sites of a <i>k</i>-mer, and (right) the position of the <i>k</i>-mer relative to the AUG start codon. (B) Illustration of the different life cycles of (left) dsRNA/(+) ssRNA viruses and (right) Retroviruses which may have led to differences in their IRESs sequence features. Retroviruses are integrated into the host genome and RNA-PolII transcribes their mRNA in the nucleus. Thus, their IRES elements are exposed to the nuclear environment including mRNA modifying enzymes (methylation, pseudouridylation etc) and nuclear specific ITAFs that can shuttle with the mRNA to the cytoplasm to facilitate cap-independent recruitment of the ribosome. In contrast, dsRNA and (+) ssRNA viruses that spend their entire replication cycle in the cytoplasm are exposed to cytosolic factors, which in turn can facilitate cap-independent recruitment of the ribosome.</p
    corecore