16 research outputs found
Visualized Computational Predictions of Transcriptional Effects by Intronic Endogenous Retroviruses
<div><p>When endogenous retroviruses (ERVs) or other transposable elements (TEs) insert into an intron, the consequence on gene transcription can range from negligible to a complete ablation of normal transcripts. With the advance of sequencing technology, more and more insertionally polymorphic or private TE insertions are being identified in humans and mice, of which some could have a significant impact on host gene expression. Nevertheless, an efficient and low cost approach to prioritize their potential effect on gene transcription has been lacking. By building a computational model based on artificial neural networks (ANN), we demonstrate the feasibility of using machine-learning approaches to predict the likelihood that intronic ERV insertions will have major effects on gene transcription, focusing on the two ERV families, namely Intracisternal A-type Particle (IAP) and Early Transposon (ETn)/MusD elements, which are responsible for the majority of ERV-induced mutations in mice. We trained the ANN model using properties associated with these ERVs known to cause germ-line mutations (positive cases) and properties associated with likely neutral ERVs of the same families (negative cases), and derived a set of prediction plots that can visualize the likelihood of affecting gene transcription by ERV insertions. Our results show a highly reliable prediction power of our model, and offer a potential approach to computationally screen for other types of TE insertions that may affect gene transcription or even cause disease.</p> </div
Comparisons of potential factors linked to the likelihood of affecting gene transcription by ERVs.
<p>The name of factors is given above each panel, and the average value of each factor is compared between the positive and negative datasets. Panel A and D are comparisons of proportions using bar plots, and p-values are calculated using the ‘equality of proportions test’; Panel B, C, E and F are comparisons of means using box plots, and p-values are calculated using the ‘Student's <i>t</i>-test’.</p
Selection of the cutoff threshold for discriminating positives and negatives.
<p>A) the distribution of predicted outcomes of the positive and negative datasets. Each bar corresponds to a given range of predicted likelihood of affecting transcription by ERVs, and the height of the bar represents the percentage of ERV insertions with a predicted value within that range. Green and blue bars represent positive and negative data, respectively. Panel B shows the performance of the trained MLP model using ROC curve analysis based on the positive and negative datasets. AUC stands for ‘the area under the curve’. As shown by the two arrows in the figure, when the cutoff threshold is set at 0.4, the model's true positive rate reaches 1; when the cutoff threshold is set at 0.8, the model's false positive rate is 0.</p
Distribution of the predicted likelihood of affecting gene transcription by polymorphic ERV insertions.
<p>Each bar corresponds to a given range of predicted likelihood of affecting transcription by ERVs, and the height of the bar represents the percentage of ERV insertions with a predicted value within that range. Blue bars represent the distribution of all 134 polymorphic ERV insertions chosen for this study. Green bars represent the distribution of only polymorphic ERV insertions known in the literature as disrupting gene transcription. The ranges of predicted values (horizontal axis) are based on the logarithm scale to increase resolution at the low-end of predicted values.</p
Output space analysis of MLP predictions.
<p>The theoretical output space of the MLP prediction model is plotted separately using <i>in silico</i> ERV insertions for A) IAP in sense, B) IAP in antisense, C) ETn in sense, and D) ETn in antisense. Different ranges of predicted output values are illustrated by a rainbow of colors as shown at the bottom of the figure. For each of the above figure panels, the main triangle plot can be viewed as a stack of ERV-containing introns aligned by their centers, with the horizontal axis showing the distance from a given ERV insertion to its nearest exon. The Vertical axis represents the size of intron, with large introns at the bottom and small ones on top. All known positive (filled circles in cyan) and negative (filled squares in white) cases used for ANN training are also superimposed on top of each plot according to the type and orientation of ERVs.</p
Host Restriction and Silencing of ERVs/LTRs
<p>Blocks to various stages of the retroviral or LTR retroelement life cycle are depicted as are silencing mechanisms affecting activity of integrated elements. Examples of restriction genes and silencing mechanisms: receptor block, <i>Fv4;</i> uncoating block, <i>Trim5;</i> reverse transcription/trafficking block, <i>APOBEC3</i> and <i>Fv1;</i> transcription block, CpG methylation; and RNA processing block, <i>Nxf1</i> and RNAi. See text and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020002#pgen-0020002-t002" target="_blank">Table 2</a> for more details and other examples. An ERV or LTR element within an intron is shown to illustrate common gene-disruptive effects of such sequences through introduction of polyadenylation sites, promoters, and splice donor and acceptor sites. Spliced RNA is depicted with dashed lines. A normal gene transcript driven by the native promoter (P) is shown below the gene. A full-length retroviral transcript, which could be packaged for further rounds of retrotransposition or retroviral infection, is shown above the gene locus. Various potential aberrant or chimeric transcripts are shown above.</p
Mutagenic Mechanisms of IAP and ETn Insertions
<p>IAP and ETn insertions were classified by their mechanism of gene disruption. Well documented instances of aberrant transcription initiation (5'-terminus) and polyadenylation (3'-terminus) were counted, as well as aberrant splicing and exon skipping (internal disruption). Insertions that cause gene disruption by multiple mechanisms (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020002#pgen-0020002-st001" target="_blank">Table S1</a>) were counted once in each relevant class.</p
Common Effects of ETn and IAP Insertions on Gene Expression
<div><p>(A) ETn effects on gene transcript processing. The most common patterns of aberrant transcript processing caused by ETns in gene introns are shown. The natural LTR polyadenylation (polyA) site and a second cryptic polyadenylation site in the internal region, along with four cryptic splice acceptors (SA) and a donor site (SD), are involved in most cases. The number of such cases is an underestimate, since several reports lack sufficient detail of aberrant transcripts. In some cases, several aberrant forms have been found. Boxes denote gene exons, thin lines denote introns, and thick lines denote spliced mRNAs, with direction of transcription from left to right. For clarity, cryptic splice acceptor sites in the 3' LTR are not shown since no documented splicing events involving these sites were found. Intronic mutagenic ETns and the affected gene are most often found in the same orientation (15 of 16 cases).</p><p>(B) IAP promoter effects on gene transcription. Ectopic gene expression driven by an antisense promoter in the 5' LTR of an IAP has been reported in eight cases. In some cases, the IAP is located a significant distance upstream of the gene.</p></div
A novel isoform of IL-33 revealed by screening for transposable element promoted genes in human colorectal cancer - Fig 1
<p>A) Comparison of numbers of TE-initiated chimeric transcripts between normal and cancer samples based on stringent thresholds (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0180659#sec002" target="_blank">Methods</a>). The total number of such transcripts of each TE class was adjusted by their genomic coverage, and also normalized by the expected expression based on all chimeric transcripts in normal samples (the red dotted line). The box plot shows an interquartile range of 50% for each sample group, and outlier samples are shown when the numbers of chimeric transcripts are beyond one interquartile range from the edge of box. P-values are based on T-test. B) Similar plot for the three major ERV classes. C) Total numbers of LTR-initiated chimeric transcripts between normal and cancer samples of each individual patient based on stringent thresholds. The cancer and normal sample pair of each individual is shown as side-by-side bars in blue and orange, respectively. The height of the bars shows the total number of chimeras in each sample corrected by the library size.</p
LS513 or HCT116 cells were cultured to confluence, changed into serum-free media and confluent cells “wounded” with pipette tip (wound healing assay, WHA) or not (control, C).
<p>After 24h, conditioned media was collected for TCA precipitation of released proteins. Alternatively, LS513 or HUVEC cells were cultured to confluence, and lysates prepared using standard procedures, as a positive control. Media precipitate and lysates were analysed by Western blotting for expression of released or endogenous IL-33 isoforms. Subsequently, the blot membrane was stained with amido black to show protein loading in each lane. Double arrow denotes LTR-IL-33, single arrow denotes cleavage product.</p