16 research outputs found

    Deciphering the Code for Retroviral Integration Target Site Selection

    Get PDF
    Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into proto-oncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses

    Computational grammars for interrogation of genomes

    No full text
    Antibiotic resistance genes are embedded in mobile genetic elements (MGEs) that spread genes between organisms, even of different species. MGEs are large structures that consist of genes, and protein interaction sites. Although a considerable number of microbial DNA sequences have been published, searching for multi-resistant MGEs remains largely a manual task. This usually involves BLAST searches and a combination of keyword-based searches through sequence annotations and the literature. Using computational grammars, we can automate the recognition of arbitrarily complex sequence structures. In this chapter, we describe computational grammars, showing how they can be used to automate MGE annotation, and give examples of the annotation enabled by such grammars.16 page(s
    corecore