Search CORE

9 research outputs found

Automatic detection of anchor points for multiple sequence alignment

Author: Corel Eduardo
Devauchelle Claudine
Pitschi Florian
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Determining beforehand specific positions to align (<it>anchor points</it>) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. <it>Multiple </it>local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. Results We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. Conclusions We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MS4 - Multi-Scale Selector of Sequence Signatures: An alignment-free method for classification of biological sequences

Author: Corel Eduardo
Devauchelle Claudine
Didier Gilles
Grasseau Gilles
Laprevotte Ivan
Pitschi Florian
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

International audienc

HAL Evry

Crossref

HAL-UNICE

Springer - Publisher Connector

HAL AMU

PubMed Central

HAL Descartes

Comparing sequences without using alignments: application to HIV/SIV subtyping

Author: Debomy Laurent
Devauchelle Claudine
Didier Gilles
Grossmann Alexander
Laprevotte Ivan
Pupin Maude
Zhang Ming
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment. RESULTS: In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian Immunodeficiency Virus) sequence data are used to evaluate this method. The program produces tree topologies that are identical to those obtained by a combination of standard methods detailed in the HIV Sequence Compendium. Manual alignment editing is not necessary at any stage. Furthermore, only one user-specified parameter is needed for constructing trees. CONCLUSION: The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by our method are in good agreement with our best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions. Our method, however, is not limited to the HIV/SIV subtyping. It provides an alternative tree construction without a time-consuming aligning procedure

HAL - Lille 3

HAL Evry

HAL AMU

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

ProdInra

Variable length local decoding and alignment-free sequence comparison

Author: Corel Eduardo
Didier Gilles
Grossmann Alex
Landès-Devauchelle Claudine
Laprevotte Ivan
Publication venue: Elsevier B.V.
Publication date: 01/01/2012
Field of study

AbstractWe present the variable length local decoding, a method which augments the alphabet of a sequence or a set of sequences. Roughly speaking, the approach distinguishes several types of symbols/nucleotides according to their contexts in the sequences. These contexts have variable lengths and are defined from a prefix code.We first give an original algorithm computing the decoding with a complexity linear both in time and memory space. Next, the approach is applied to alignment-free sequence comparison. We give a heuristic way to select context lengths relevant to this question. The comparison of sequences itself is based on the composition in “augmented” symbols of their variable length local decodings. The results of this comparison are illustrated on a biological alignment

HAL Evry

Elsevier - Publisher Connector

HAL-UNICE

HAL AMU

HAL Descartes

Rate matrices for analyzing large families of protein sequences

Author: Devauchelle Claudine
Grossmann Alex
Holschneider Matthias
Hénaut Alain
MONNEROT Monique
Risler Jean-Loup
Torrésani Bruno
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2004
Field of study

International audienceWe propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration

HAL AMU

Hal-Diderot