23 research outputs found

    Chromatin Landscapes of Retroviral and Transposon Integration Profiles

    No full text
    <div><p>The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of to unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%–33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.</p></div

    Unselected integration profiles and CIS designation.

    No full text
    <p>A) The bias of unselected integrations relative to CIS integrations, on a scale from blue (more CIS integrations) to red (more unselected integrations). B) log2 ratio of activating CISs and repressing CISs. A CIS is activating if it is not within a gene, or within a gene and 90% homogeneous with regard to orientation relative to that gene. Otherwise it is repressive. C) Bias of unselected integrations for CIS regions in a (i) genome-wide background, (ii) genic background (+/−100 kb), and (iii) intergenic background (whole genome except genes +/−100 kb), as measured by the log2 ratio of observed (unselected integrations) and expected (matched controls). D) CIS integration counts vs. unselected integration counts. CISs are annotated with the nearest TSS. Note that a single gene can be associated with multiple CISs. Spurious CISs were determined by a one-sided binomial test to determine if the CIS contained more CIS integrations than unselected integrations (, FDR-corrected).</p

    Bootstrapped Markov blanket discovery.

    No full text
    <p>Bayesian network inference (BNI) is performed on 400 bootstraps of size 20000. The -axis represents the fraction of bootstraps that a feature occurs in the Markov blanket of integration proximity in a resulting Bayesian network, i.e. the confidence we have in an edge. The -axis represents the mean conditional mutual information (CMI) of integration proximity with a feature across all Markov blankets in which this feature occurs, i.e. the strength of an edge. Note that features that do not occur in the Markov blanket of any bootstrap, i.e. are never considered relevant for integration proximity by the BNI approach, are not shown in this figure.</p

    Scale-based analysis of integration bias.

    No full text
    <p>A) Association of the unselected integration profiles with various genome-wide features across different scales. Measure of association is a normalized <i>t</i>-score (see Material and Methods), computed on rank-normalized feature values, visualized on a blue-gray-red scale from negative to positive <i>t</i>-scores. Associations that are not significant (FDR-corrected ) are white. A positive (negative) <i>t</i>-score for a certain scale and feature means that for that particular feature, the mean values in a 200 bp window around the integrations are on average higher (lower) than the mean values in a 200 bp window around the points at a distance of upstream and downstream from the integration (see Material and Methods). The dendrogram shows a hierarchical clustering of the profiles using the euclidean distance measure and ward linkage. B) The rank-transformed smallest scale at which significance is achieved, with a scale going from white (small scale) to black (large scale). A feature is called a ‘macrofeature’ if its smallest significant scale is larger than the mean rank-normalized smallest significant scale across features, in both systems. C) Features associated with transcriptional repression and/or activation, based on published literature.</p

    Influence of gene expression on integration bias.

    No full text
    <p>For each of the systems SB, PB, and MMTV, the unselected integrations are divided into genic integrations, integrations occurring within 1(TSS_upstream), and integrations occurring within 1 kb downstream of the TSS (TSS_downstream). Genes are divided into 5 groups, based on expression level. The sizes of these groups are indicated on the -axis. For each pair of gene expression level and system, the number of observed integrations is counted, and compared to the number of expected integrations.</p

    Unselected integration profiles with respect to TAD - TAD boundary interface.

    No full text
    <p>The -axis represents genomic distance from the interface. The -axis represents the log2 ratio of observed number of integrations versus the expected number of integrations.</p

    Hierarchical model of integration target site selection.

    No full text
    <p>On a large scale, target site selection is directed by macrofeatures for all three systems in similar ways. Differences between the systems are determined by microfeatures.</p
    corecore