Search CORE

Seamless Coarse Grained Parallelism Integration in Intensive Bioinformatics Workflows

Author: Moreews Francois
Lavenier Dominique
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/05/2016
Field of study

To be easily constructed, shared and maintained, complex in silico bioinformatics analysis are structured as workflows. Furthermore, the growth of computational power and storage demand from this domain, requires workflows to be efficiently executed. However, workflow performances usually rely on the ability of the designer to extract potential parallelism. But atomic bioinformatics tasks do not often exhibit direct parallelism which may appears later in the workflow design process. In this paper, we propose a Model-Driven Architecture approach for capturing the complete design process of bioinformatics workflows. More precisely, two workflow models are specified: the first one, called design model, graphically captures a low throughput prototype. The second one, called execution model, specifies multiple levels of coarse grained parallelism. The execution model is automatically generated from the design model using annotation derived from the EDAM ontology. These annotations describe the data types connecting differents elementary tasks. The execution model can then be interpreted by a workflow engine and executed on hardware having intensive computation facility

Almae Matris Studiorum Campus

Quality metrics for benchmarking sequences comparison tools

Author: Drezen Erwan
Lavenier Dominique
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

International audienceComparing sequences is a daily task in bioinformatics and many software try to fulfill this need by proposing fast execution times and accurate results. Introducing a new software in this field requires to compare it to recognized tools with the help of well defined metrics. A set of quality metrics is proposed that enables a systematic approach for comparing alignment tools. These metrics have been implemented in a dedicated software, allowing to produce textual and graphical benchmark artifacts

Crossref

GASSST: global alignment short sequence search tool

Author: Lavenier Dominique
Rizk Guillaume
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold—achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads

CiteSeerX

PubMed Central

Un Réseau systolique intégré pour la correction de fautes de frappe

Author: Lavenier Dominique
Publication venue: HAL CCSD
Publication date: 01/01/1992
Field of study

Ce rapport présente la réalisation d'un circuit VLSI spécialisé pour la correction de fautes de frappe. L'architecture du circuit est basée sur une structure réguliere, un réseau systolique bidimensionnel de 69 processeurs. La méthodologie suivie pendant la conception du circuit tire profit de cette regularité, notamment pendant les phases de validation

arXiv.org e-Print Archive

Multiple Comparative Metagenomics using Multiset k-mer Counting

Author: Benoit Gaëtan
Drezen Erwan
Lavenier Dominique
Lemaitre Claire
Mariadassou Mahendra
Peterlongo Pierre
Schbath Sophie
Publication venue
Publication date: 28/04/2016
Field of study

Background. Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods. These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today's metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results. Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling

Crossref

Directory of Open Access Journals

A Component model for synchronous VLSI system design

Author: Lavenier Dominique
Mcconnell Roderick
Publication venue: HAL CCSD
Publication date: 01/01/1994
Field of study

Disponible dans les fichiers attachés à ce documen

Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph

Author: Chikhi Rayan
Lavenier Dominique
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International audienceNext-generation de novo short reads assemblers typically use the following strategy: (1) assemble unpaired reads using heuristics leading to contigs; (2) order contigs from paired reads information to produce scaffolds. We propose to unify these two steps by introducing localized assembly: direct construction of scaffolds from reads. To this end, the paired string graph structure is introduced, along with a formal framework for building scaffolds as paths of reads. This framework leads to the design of a novel greedy algorithm for memory-efficient, parallel assembly of paired reads. A prototype implementation of the algorithm has been developed and applied to the assembly of simulated and experimental short reads. Our experiments show that our methods yields longer scaffolds than recent assemblers, and is capable of assembling diploid genomes significantly better than other greedy methods

CiteSeerX