Search CORE

5 research outputs found

Efficient and accurate P-value computation for Position Weight Matrices

Author: A Liefooghe
C Pizzi
E Wingender
G Bejerano
GE Crooks
GZ Hertz
H Huang
Hélène Touzet
J Zhang
Jean-Stéphane Varré
JM Claverie
K Malde
M Beckstette
M Garey
R Staden
S Mount
S Rahmann
TD Wu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. Results The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available. Conclusion We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.</p

HAL - Lille 3

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

CLUSS: Clustering of protein sequences based on a new similarity measure

Author: A Krause
Abdellali Kelil
AJ Enright
Alain Fleury
C Notredame
D Higgins
ELL Sonnhammer
F Titgemeyer
G Reinert
G Yona
H Lodish
IV Tetko
J Felsenstein
J Heringa
J Rocha
JD Thompson
JD Thompson
JH Ward
JH Ward
JS Varré
K Katoh
K Sjölander
K Sjölander
M Ike
M Kimura
MO Dayhoff
MY Leung
N Côté
N Wicker
P Pipenbacher
R Jothi
RC Edgar
RC Edgar
RO Duda
Ryszard Brzezinski
S Fanning
S Henikoff
S Karlin
S Karlin
S Karlin
S Vinga
SF Altschul
SF Altschul
Shengrui Wang
T Fukamizo
T Ishimizu
V Batagelj
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity. Conclusion We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Structural and Content Diversity of Mitochondrial Genome in Beet: A Comparative Genomic Analysis

Author: A. Courseaux
A. Darracq
Allen
Alverson
Anisimova
B. Vacherie
Barr
Budar
Charlesworth
Chase
Cho
Cuguen
Darling
Darracq
Desplanque
Ducos
Dufaÿ
Dufaÿ
Fénart
Gascuel
Gray
Huson
J. S. Varré
Kawanishi
Kosakovsky Pond
Kubo
Kubo
L. Maréchal-Drouard
Lartillot
Lowe
Lynch
Matsunaga
Moison
Mower
Mower
Nishizawa
Noe
Onodera
P. Lenoble
P. Saumitou-Laprade
P. Touzet
Palmer
Parkinson
Rozas
S. Oztas
Satoh
Schmidt
Scotti
Sloan
Sloan
Touzet
Tsukihara
V. Barbe
V. Castric
Vallenet
Xu
Yamamoto
Yamamoto
Yang
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Despite their monophyletic origin, mitochondrial (mt) genomes of plants and animals have developed contrasted evolutionary paths over time. Animal mt genomes are generally small, compact, and exhibit high mutation rates, whereas plant mt genomes exhibit low mutation rates, little compactness, larger sizes, and highly rearranged structures. We present the (nearly) whole sequences of five new mt genomes in the Beta genus: four from Beta vulgaris and one from B. macrocarpa, a sister species belonging to the same Beta section. We pooled our results with two previously sequenced genomes of B. vulgaris and studied genome diversity at the species level with an emphasis on cytoplasmic male-sterilizing (CMS) genomes. We showed that, contrary to what was previously assumed, all three CMS genomes belong to a single sterile lineage. In addition, the CMSs seem to have undergone an acceleration of the rates of substitution and rearrangement. This study suggests that male sterility emergence might have been favored by faster rates of evolution, unless CMS itself caused faster evolution

HAL - Lille 3

HAL Evry

Crossref

INRIA a CCSD electronic archive server

PubMed Central

HAL-CEA

Segment Match Refinement and Applications

Author: A. L. Delcher
E. W. Myers
J. D. Kececioglu
J. D. Kececioglu
J.-S. Varré
P. A. Pevzner
S. Batzoglou
S. F. Altschul
S. F. Altschul
S. Kurtz
S. Schwartz
W. J. Wilbur
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Multichromosomal structure and foreign tracts in the Ombrophytum subterraneum (Balanophoraceae) mitochondrial genome

Crossref