10 research outputs found

    Evolutionary Conservation of Orthoretroviral Long Terminal Repeats (LTRs) and ab initio Detection of Single LTRs in Genomic Data

    Get PDF
    Background: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis. Principal Findings: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs. Conclusion: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.Peer reviewe

    Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    Get PDF
    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes

    The First Sequenced Carnivore Genome Shows Complex Host-Endogenous Retrovirus Relationships

    Get PDF
    Host-retrovirus interactions influence the genomic landscape and have contributed substantially to mammalian genome evolution. To gain further insights, we analyzed a female boxer (Canis familiaris) genome for complexity and integration pattern of canine endogenous retroviruses (CfERV). Intriguingly, the first such in-depth analysis of a carnivore species identified 407 CfERV proviruses that represent only 0.15% of the dog genome. In comparison, the same detection criteria identified about six times more HERV proviruses in the human genome that has been estimated to contain a total of 8% retroviral DNA including solitary LTRs. These observed differences in man and dog are likely due to different mechanisms to purge, restrict and protect their genomes against retroviruses. A novel group of gammaretrovirus-like CfERV with high similarity to HERV-Fc1 was found to have potential for active retrotransposition and possibly lateral transmissions between dog and human as a result of close interactions during at least 10.000 years. The CfERV integration landscape showed a non-uniform intra- and inter-chromosomal distribution. Like in other species, different densities of ERVs were observed. Some chromosomal regions were essentially devoid of CfERVs whereas other regions had large numbers of integrations in agreement with distinct selective pressures at different loci. Most CfERVs were integrated in antisense orientation within 100 kb from annotated protein-coding genes. This integration pattern provides evidence for selection against CfERVs in sense orientation relative to chromosomal genes. In conclusion, this ERV analysis of the first carnivorous species supports the notion that different mammals interact distinctively with endogenous retroviruses and suggests that retroviral lateral transmissions between dog and human may have occurred

    Retroviral long Terminal Repeats; Structure, Detection and Phylogeny

    No full text
    Long terminal repeats (LTRs) are non-coding repeats flanking the protein-coding genes of LTR retrotransposons. The variability of LTRs poses a challenge in studying them. Hidden Markov models (HMMs), probabilistic models widely used in pattern recognition, are useful in dealing with this variability. The aim of this work was mainly to study LTRs of retroviruses and LTR retrotransposons using HMMs. Paper I describes the methodology of HMM modelling applied to different groups of LTRs from exogenous retroviruses (XRVs) and endogenous retroviruses (ERVs). The detection capabilities of HMMs were assessed and were found to be high for homogeneous groups of LTRs. The alignments generated by the HMMs displayed conserved motifs some of which could be related to known functions of XRVs. The common features of the different groups of retroviral LTRs were investigated by combining them into a single alignment. They were the short inverted terminal repeats TG and CA and three AT-rich stretches which provide retroviruses with TATA boxes and AATAAA polyadenylation signals. In Paper II, phylogenetic trees of three groups of retroviral LTRs were constructed by using HMM-based alignments. The LTR trees were consistent with trees based on other retroviral genes suggesting co-evolution between LTRs and these genes. In Paper III, the methods in Paper I and II were extended to LTRs from other retrotransposon groups, covering much of the diversity of all known LTRs. For the first time an LTR phylogeny could be achieved. There were no major disagreement between the LTR tree and trees based on three different domains of the Pol gene. The conserved LTR structure of paper I was found to apply to all LTRs. Putative Integrase recognition motifs extended up to 12 bp beyond the short inverted repeats TG/CA. Paper IV is a review article describing the use of sequence similarity and structural markers for the taxonomy of ERVs. ERVs were originally classified into three classes according to the length of the target site duplication. While this classification is useful it does not include all ERVs. A naming convention based on previous ERV and XRV nomenclature but taking into account newer information is advocated in order to provide a practical yet coherent scheme in dealing with new unclassified ERV sequences. Paper V gives an overview of bioinformatics tools for studies of ERVs and of retroviral evolution before and after endogenization. It gives some examples of recent integrations in vertebrate genomes and discusses pathogenicity of human ERVs including their possible relation to cancers. In conclusion, HMMs were able to successfully detect and align LTRs. Progress was made in understanding their conserved structure and phylogeny. The methods developed in this thesis could be applied to different kinds of non-coding DNA sequence element

    RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences

    Get PDF
    BACKGROUND: The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. METHODS: RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. RESULT: ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <or= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. DISCUSSION: Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. CONCLUSION: ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission

    Conserved structure and inferred evolutionary history of long terminal repeats (LTRs)

    Get PDF
    Background: Long terminal repeats (LTRs, consisting of U3-R-U5 portions) are important elements of retroviruses and related retrotransposons. They are difficult to analyse due to their variability. The aim was to obtain a more comprehensive view of structure, diversity and phylogeny of LTRs than hitherto possible. Results: Hidden Markov models (HMM) were created for 11 clades of LTRs belonging to Retroviridae (class III retroviruses), animal Metaviridae (Gypsy/Ty3) elements and plant Pseudoviridae (Copia/Ty1) elements, complementing our work with Orthoretrovirus HMMs. The great variation in LTR length of plant Metaviridae and the few divergent animal Pseudoviridae prevented building HMMs from both of these groups. Animal Metaviridae LTRs had the same conserved motifs as retroviral LTRs, confirming that the two groups are closely related. The conserved motifs were the short inverted repeats (SIRs), integrase recognition signals (5' TGTTRNR ... YNYAACA 3'); the polyadenylation signal or AATAAA motif; a GT-rich stretch downstream of the polyadenylation signal; and a less conserved AT-rich stretch corresponding to the core promoter element, the TATA box. Plant Pseudoviridae LTRs differed slightly in having a conserved TATA-box, TATATA, but no conserved polyadenylation signal, plus a much shorter R region. The sensitivity of the HMMs for detection in genomic sequences was around 50% for most models, at a relatively high specificity, suitable for genome screening. The HMMs yielded consensus sequences, which were aligned by creating an HMM model (a 'Superviterbi' alignment). This yielded a phylogenetic tree that was compared with a Pol-based tree. Both LTR and Pol trees supported monophyly of retroviruses. In both, Pseudoviridae was ancestral to all other LTR retrotransposons. However, the LTR trees showed the chromovirus portion of Metaviridae clustering together with Pseudoviridae, dividing Metaviridae into two portions with distinct phylogeny. Conclusion: The HMMs clearly demonstrated a unitary conserved structure of LTRs, supporting that they arose once during evolution. We attempted to follow the evolution of LTRs by tracing their functional foundations, that is, acquisition of RNAse H, a combined promoter/polyadenylation site, integrase, hairpin priming and the primer binding site (PBS). Available information did not support a simple evolutionary chain of events
    corecore