Search CORE

7 research outputs found

Efficient seeding techniques for protein similarity search

Author: Furletova Eugenia
Gambin Anna
Kucherov Gregory
Lasota Slawomir
Noé Laurent
Roytberg Mihkail
Szczurek Ewa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Efficient seeding techniques for protein similarity search

Author: Roytberg Mihkail
Gambin Anna
Noé Laurent
Lasota Slawomir
Furletova Eugenia
Szczurek Ewa
Kucherov Gregory
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Optimal neighborhood indexing for protein similarity search

Author: D Lipman
D Lipman
DG Brown
Dominique Lavenier
Gregory Kucherov
J Henikoff
JL Hennessy
L Murphy
L Noé
Laurent Noé
M Crochemore
M Li
M Roytberg
Mathieu Giraud
MP Styczynski
N Cannata
P Peterlongo
Pierre Peterlongo
R Edgar
S Altschul
S Altschul
S Henikoff
S Henikoff
S Karlin
T Li
Van Hoa Nguyen
VH Nguyen
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.\ud \ud Results: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum.\ud \ud Conclusions: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL-Rennes 1

On subset seeds for protein alignment

Author: Furletova Eugenia
Gambin Anna
Kucherov Gregory
Lasota Slawomir
Noé Laurent
Roytberg Mikhail A.
Szczurek Ewa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.Comment: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2009

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

MPG.PuRe

Fine-grained parallelization of similarity search between protein sequences

Author: Lavenier Dominique
Nguyen Van Hoa
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

This report presents the implementation of a protein sequence comparison algorithm specifically designed for speeding up time consuming part on parallel hardware such as SSE instructions, multicore architectures or graphic boards. Three programs have been developed: PLAST-P, TPLAST-N and PLAST-X. They provide equivalent results compared to the NCBI BLAST family programs (BLAST-P, TBLAST-N and BLAST-X) with a speed-up factor ranging from 5 to 10

HAL-CentraleSupelec

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Protein similarity search with subset seeds on a dedicated reconfigurable hardware

Author: Dominique Lavenier
Gilles Georges
Gregory Kucherov
Julien Jacques
Laurent Noé
Mathieu Giraud
Pierre Peterlongo
Publication venue: Springer
Publication date: 07/09/2007
Field of study

With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large- scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embed- ding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity searc

HAL-CentraleSupelec

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Lire les lectures : analyse de données de séquençage

Author: Peterlongo Pierre
Publication venue: HAL CCSD
Publication date: 25/01/2016
Field of study

Tous les travaux présentés dans cette HDR concernent l’exploitation de données de séquençage haut débit en absence de génome de référence proche et de bonne qualité.Dans un premier chapitre, nous proposons de nouvelles approches pour extraire des variants biologiques d’intérêt de ces données de séquençage. Dans un second chapitre nous exposons des méthodes de comparaisons de jeux de données de séquençage. Enfin, dans un troisième chapitre, nous proposons une méthode préliminaire à de meilleurs « assemblages » de ces données de séquençage

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1