7 research outputs found

    jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1

    Get PDF
    Detecting recombinations in the genome sequence of human immunodeficiency virus (HIV-1) is crucial for epidemiological studies and for vaccine development. Herein, we present a web server for subtyping and localization of phylogenetic breakpoints in HIV-1. Our software is based on a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach proposed by Spang et al. The input data for our server is a partial or complete genome sequence from HIV-1; our tool assigns regions of the input sequence to known subtypes of HIV-1 and predicts phylogenetic breakpoints. jpHMM is available online at

    A novel approach to remote homology detection: jumping alignments

    Get PDF
    Spang R, Rehmsmeier M, Stoye J. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 2002;9(5):747-760.We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods

    A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes

    Get PDF
    BACKGROUND: Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events. RESULTS: We developed a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. Given a partition of the aligned input sequence family into known sequence subtypes, our model can jump between states corresponding to these different subtypes, depending on which subtype is locally most similar to a database sequence. Jumps between different subtypes are indicative of intersubtype recombinations. We applied our method to a large set of genome sequences from human immunodeficiency virus (HIV) and hepatitis C virus (HCV) as well as to simulated recombined genome sequences. CONCLUSION: Our results demonstrate that jumps in our jumping profile HMM often correspond to recombination breakpoints; our approach can therefore be used to detect recombinations in genomic sequences. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative sequences

    Sequence Database Search Using Jumping Alignments

    No full text
    Spang R, Rehmsmeier M, Stoye J. Sequence Database Search Using Jumping Alignments. In: Proc. of ISMB 2000. 2000: 367-375

    Sequence Database Search Using Jumping Alignments

    No full text
    We describe a new algorithm for amino acid sequence classication and the detection of remote homologues. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well balanced manner. This is in contrast to established methods like proles and hidden Markov models which focus on vertical information as they model the columns of the alignment independently. In our setting, we want to select from a given database of \candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith{Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multipl..

    Analysis of recombination in molecular sequence data

    Get PDF
    We present the new and fast method Recco for analyzing a multiple alignment regarding recombination. Recco is based on a dynamic program that explains one sequence in the alignment with the other sequences using mutation and recombination. The dynamic program allows for an intuitive visualization of the optimal solution and also introduces a parameter α controlling the number of recombinations in the solution. Recco performs a parametric analysis regarding α and orders all pareto-optimal solutions by increasing number of recombinations. α is also directly related to the Savings value, a quantitative and intuitive measure for the preference of recombination in the solution. The Savings value and the solutions have a simple interpretation regarding the ancestry of the sequences in the alignment and it is usually easy to understand the output of the method. The distribution of the Savings value for non-recombining alignments is estimated by processing column permutations of the alignment and p-values are provided for recombination in the alignment, in a sequence and at a breakpoint position. Recco also uses the p-values to suggest a single solution, or recombinant structure, for the explained sequence. Recco is validated on a large set of simulated alignments and has a recombination detection performance superior to all current methods. The analysis of real alignments confirmed that Recco is among the best methods for recombination analysis and further supported that Recco is very intuitive compared to other methods.Wir präsentieren Recco, eine neue und schnelle Methode zur Analyse von Rekombinationen in multiplen Alignments. Recco basiert auf einem dynamischen Programm, welches eine Sequenz im Alignment durch die anderen Sequenzen im Alignment rekonstruiert, wobei die Operatoren Mutation und Rekombination erlaubt sind. Das dynamische Programm ermöglicht eine intuitive Visualisierung der optimalen Lösung und besitzt einen Parameter α, welcher die Anzahl der Rekombinationsereignisse in der optimalen Lösung steuert. Recco fĂĽhrt eine parametrische Analyse bezĂĽglich des Parameters α durch, so dass alle pareto-optimalen Lösungen nach der Anzahl ihrer Rekombinationsereignisse sortiert werden können. α steht auch direkt in Beziehung mit dem sogenannten Savings-Wert, der die Neigung zum EinfĂĽgen von Rekombinationsereignissen in die optimale Lösung quantitativ und intuitiv bemisst. Der Savings-Wert und die optimalen Lösungen haben eine einfache Interpretation bezĂĽglich der Historie der Sequenzen im Alignment, so dass es in der Regel leicht fällt, die Ausgabe von Recco zu verstehen. Recco schätzt die Verteilung des Savings-Werts fĂĽr Alignments ohne Rekombinationen durch einen Permutationstest, der auf Spaltenpermutationen basiert. Dieses Verfahren resultiert in p-Werten fĂĽr Rekombination im Alignment, in einer Sequenz und an jeder Position im Alignment. Basierend auf diesen p-Werten schlägt Recco eine optimale Lösung vor, als Schätzer fĂĽr die rekombinante Struktur der erklärten Sequenz. Recco wurde auf einem groĂźen Datensatz simulierter Alignments getestet und erzielte auf diesem Datensatz eine bessere VorhersagegĂĽte in Bezug auf das Erkennen von Alignments mit Rekombination als alle anderen aktuellen Verfahren. Die Analyse von realen Datensätzen bestätigte, dass Recco zu den besten Methoden fĂĽr die Rekombinationsanalyse gehört und im Vergleich zu anderen Methoden oft leichter verständliche Resultate liefert
    corecore