68 research outputs found

    A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes

    Get PDF
    BACKGROUND: Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events. RESULTS: We developed a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. Given a partition of the aligned input sequence family into known sequence subtypes, our model can jump between states corresponding to these different subtypes, depending on which subtype is locally most similar to a database sequence. Jumps between different subtypes are indicative of intersubtype recombinations. We applied our method to a large set of genome sequences from human immunodeficiency virus (HIV) and hepatitis C virus (HCV) as well as to simulated recombined genome sequences. CONCLUSION: Our results demonstrate that jumps in our jumping profile HMM often correspond to recombination breakpoints; our approach can therefore be used to detect recombinations in genomic sequences. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative sequences

    jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1

    Get PDF
    Detecting recombinations in the genome sequence of human immunodeficiency virus (HIV-1) is crucial for epidemiological studies and for vaccine development. Herein, we present a web server for subtyping and localization of phylogenetic breakpoints in HIV-1. Our software is based on a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach proposed by Spang et al. The input data for our server is a partial or complete genome sequence from HIV-1; our tool assigns regions of the input sequence to known subtypes of HIV-1 and predicts phylogenetic breakpoints. jpHMM is available online at

    BF Integrase Genes of HIV-1 Circulating in São Paulo, Brazil, with a Recurrent Recombination Region

    Get PDF
    Although some studies have shown diversity in HIV integrase (IN) genes, none has focused particularly on the gene evolving in epidemics in the context of recombination. The IN gene in 157 HIV-1 integrase inhibitor-naïve patients from the São Paulo State, Brazil, were sequenced tallying 128 of subtype B (23 of which were found in non-B genomes), 17 of subtype F (8 of which were found in recombinant genomes), 11 integrases were BF recombinants, and 1 from subtype C. Crucially, we found that 4 BF recombinant viruses shared a recurrent recombination breakpoint region between positions 4900 and 4924 (relative to the HXB2) that includes 2 gRNA loops, where the RT may stutter. Since these recombinants had independent phylogenetic origin, we argue that these results suggest a possible recombination hotspot not observed so far in BF CRF in particular, or in any other HIV-1 CRF in general. Additionally, 40% of the drug-naïve and 45% of the drug-treated patients had at least 1 raltegravir (RAL) or elvitegravir (EVG) resistance-associated amino acid change, but no major resistance mutations were found, in line with other studies. Importantly, V151I was the most common minor resistance mutation among B, F, and BF IN genes. Most codon sites of the IN genes had higher rates of synonymous substitutions (dS) indicative of a strong negative selection. Nevertheless, several codon sites mainly in the subtype B were found under positive selection. Consequently, we observed a higher genetic diversity in the B portions of the mosaics, possibly due to the more recent introduction of subtype F on top of an ongoing subtype B epidemics and a fast spread of subtype F alleles among the B population

    jpHMM: Improving the reliability of recombination prediction in HIV-1

    Get PDF
    Previously, we developed jumping profile hidden Markov model (jpHMM), a new method to detect recombinations in HIV-1 genomes. The jpHMM predicts recombination breakpoints in a query sequence and assigns to each position of the sequence one of the major HIV-1 subtypes. Since incorrect subtype assignment or recombination prediction may lead to wrong conclusions in epidemiological or vaccine research, information about the reliability of the predicted parental subtypes and breakpoint positions is valuable. For this reason, we extended the output of jpHMM to include such information in terms of ‘uncertainty’ regions in the recombination prediction and an interval estimate of the breakpoint. Both types of information are computed based on the posterior probabilities of the subtypes at each query sequence position. Our results show that this extension strongly improves the reliability of the jpHMM recombination prediction. The jpHMM is available online at http://jphmm.gobics.de/

    Virology

    Get PDF
    Lack of a consistent and reliable genotyping system can critically impede HIV genomic research on pathogenesis, fitness, virulence, drug resistance, and genomic-based healthcare and treatment. At present, mis-genotyping, i.e., background noises in molecular genotyping, and its impact on epidemic surveillance is unknown. For the first time, we present a comprehensive assessment of HIV genotyping quality. HIV sequence data were retrieved from worldwide published records, and subjected to a systematic genotyping assessment pipeline. Results showed that mis-genotyped cases occurred at 4.6% globally, with some regional and high-risk population heterogeneities. Results also revealed a consistent mis-genotyping pattern in gp120 in all studied populations except the group of men who have sex with men. Our study also suggests novel virus diversities in the mis-genotyped cases. Finally, this study reemphasizes the importance of implementing a standardized genotyping pipeline to avoid genotyping disparity and to advance our understanding of virus evolution in various epidemiological settings.CC999999/Intramural CDC HHS/United StatesR03 AI104258/AI/NIAID NIH HHS/United StatesR03 AI120203/AI/NIAID NIH HHS/United States2019-04-01T00:00:00Z28918303PMC64431016095vault:3183

    Detection of viral sequence fragments of HIV-1 subfamilies yet unknown

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Methods of determining whether or not any particular HIV-1 sequence stems - completely or in part - from some unknown HIV-1 subtype are important for the design of vaccines and molecular detection systems, as well as for epidemiological monitoring. Nevertheless, a single algorithm only, the Branching Index (BI), has been developed for this task so far. Moving along the genome of a query sequence in a sliding window, the BI computes a ratio quantifying how closely the query sequence clusters with a subtype clade. In its current version, however, the BI does not provide predicted boundaries of unknown fragments.</p> <p>Results</p> <p>We have developed <it>Unknown Subtype Finder </it>(USF), an algorithm based on a probabilistic model, which automatically determines which parts of an input sequence originate from a subtype yet unknown. The underlying model is based on a simple profile hidden Markov model (pHMM) for each <it>known </it>subtype and an additional pHMM for an <it>unknown </it>subtype. The emission probabilities of the latter are estimated using the emission frequencies of the known subtypes by means of a (position-wise) probabilistic model for the emergence of new subtypes. We have applied USF to SIV and HIV-1 sequences formerly classified as having emerged from an unknown subtype. Moreover, we have evaluated its performance on artificial HIV-1 recombinants and non-recombinant HIV-1 sequences. The results have been compared with the corresponding results of the BI.</p> <p>Conclusions</p> <p>Our results demonstrate that USF is suitable for detecting segments in HIV-1 sequences stemming from yet unknown subtypes. Comparing USF with the BI shows that our algorithm performs as good as the BI or better.</p

    Classification of HIV-1 Sequences Using Profile Hidden Markov Models

    Get PDF
    Accurate classification of HIV-1 subtypes is essential for studying the dynamic spatial distribution pattern of HIV-1 subtypes and also for developing effective methods of treatment that can be targeted to attack specific subtypes. We propose a classification method based on profile Hidden Markov Model that can accurately identify an unknown strain. We show that a standard method that relies on the construction of a positive training set only, to capture unique features associated with a particular subtype, can accurately classify sequences belonging to all subtypes except B and D. We point out the drawbacks of the standard method; namely, an arbitrary choice of threshold to distinguish between true positives and true negatives, and the inability to discriminate between closely related subtypes. We then propose an improved classification method based on construction of a positive as well as a negative training set to improve discriminating ability between closely related subtypes like B and D. Finally, we show how the improved method can be used to accurately determine the subtype composition of Common Recombinant Forms of the virus that are made up of two or more subtypes. Our method provides a simple and highly accurate alternative to other classification methods and will be useful in accurately annotating newly sequenced HIV-1 strains

    Recent advances in inferring viral diversity from high-throughput sequencing data

    Get PDF
    Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

    Characterization and frequency of a newly identified HIV-1 BF1 intersubtype circulating recombinant form in São Paulo, Brazil

    Get PDF
    Background: HIV circulating recombinant forms (CRFs) play an important role in the global and regional HIV epidemics, particularly in regions where multiple subtypes are circulating. To date, several (>40) CRFs are recognized worldwide with five currently circulating in Brazil. Here, we report the characterization of near full-length genome sequences (NFLG) of six phylogenetically related HIV-1 BF1 intersubtype recombinants (five from this study and one from other published sequences) representing CRF46_BF1.Methods: Initially, we selected 36 samples from 888 adult patients residing in São Paulo who had previously been diagnosed as being infected with subclade F1 based on pol subgenomic fragment sequencing. Proviral DNA integrated in peripheral blood mononuclear cells (PBMC) was amplified from the purified genomic DNA of all 36-blood samples by five overlapping PCR fragments followed by direct sequencing. Sequence data were obtained from the five fragments that showed identical genomic structure and phylogenetic trees were constructed and compared with previously published sequences. Genuine subclade F1 sequences and any other sequences that exhibited unique mosaic structures were omitted from further analysisResults: of the 36 samples analyzed, only six sequences, inferred from the pol region as subclade F1, displayed BF1 identical mosaic genomes with a single intersubtype breakpoint identified at the nef-U3 overlap (HXB2 position 9347-9365; LTR region). Five of these isolates formed a rigid cluster in phylogentic trees from different subclade F1 fragment regions, which we can now designate as CRF46_BF1. According to our estimate, the new CRF accounts for 0.56% of the HIV-1 circulating strains in São Paulo. Comparison with previously published sequences revealed an additional five isolates that share an identical mosaic structure with those reported in our study. Despite sharing a similar recombinant structure, only one sequence appeared to originate from the same CRF46_BF1 ancestor.Conclusion: We identified a new circulating recombinant form with a single intersubtype breakpoint identified at the nef-LTR U3 overlap and designated CRF46_BF1. Given the biological importance of the LTR U3 region, intersubtype recombination in this region could play an important role in HIV evolution with critical consequences for the development of efficient genetic vaccines.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Hemoctr, Fundacao Prosangue, São Paulo, BrazilUniversidade Federal de São Paulo, Retrovirol Lab, São Paulo, BrazilUniversidade Federal de São Paulo, Retrovirol Lab, São Paulo, BrazilFAPESP: 06/50096-0FAPESP: 2004/15856-9FAPESP: 2007/04890-0Web of Scienc
    corecore