Search CORE

28 research outputs found

Comparative analysis of RNA genes: the caRNAc software

Author: Touzet Helene
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

RNA genes are ubiquitous in the cell and are involved in a number of biochemical processes. Since there is a close relationship between function and structure, software tools that predict the secondary structure of non-coding RNAs from the base sequence are very helpful. In this article, we focus our attention on the inference of conserved secondary structure for a group of homologous RNA sequences. We present the caRNAc software which enables the analysis of families of homologous sequences without prior alignment. The method relies both on comparative analysis and thermodynamic information

HAL - Lille 3

INRIA a CCSD electronic archive server

Comparative analysis of RNA genes: the caRNAc software

Author: Touzet Helene
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

INRIA a CCSD electronic archive server

CG-seq: a toolbox for automatic annotation of genomes by comparative analysis

Author: De Monte Antoine
Grenier-Boley Benjamin
Touzet Helene
Publication venue: HAL CCSD
Publication date: 29/10/2010
Field of study

CG-seq is a software pipeline to identify functional regions such as noncoding RNAs or protein coding genes in a genomic sequence by comparative analysis and multispecies comparison. It takes as input a genomic sequence to annotate and a set of other sequences coming from a variety of species to be compared against the user sequence. The pipeline includes several external software components to perform sequence analysis tasks as well as some new features that were especially developed for the purpose. CG-seq is distributed under the GPL licence. It is available both for command line interface usage or with a Graphical User Interface. It can be downloaded from http://bioinfo.lifl.fr/CGseq. A web version can also be runned from this same website for input data of limited length.CG-seq est une suite logicielle qui permet l'identification de régions fonctionnelles, telles que les ARN non-codants ou les gènes codants, dans une séquence génomique en utilisant le principe de la génomique comparative et de la comparaison entre espèces. Il prend en entrée une séquence à annoter, ainsi que d'autres séquences de référence issues de différentes espèces, et retourne en sortie une liste de régions candidates, avec leur annotation. Pour ce faire, CG-seq intègre plusieurs logiciels d'analyse de séquences existants, ainsi que de nouveaux modules qui ont été développés spécifiquement pour ce travail. CG-seq est distribué sous licence GPL, et téléchargeable à http://bioinfo.lifl.fr/CGseq. Il est disponible pour une utilisation en ligne de commande ou avec une interface graphique. Une version web est également proposée sur ce même site, qui permet de tester CG-seq sur des séquences de longueur raisonnable

HAL - Lille 3

INRIA a CCSD electronic archive server

Biomanycores, open-source parallel code for many-core bioinformatics

Author: Berthelot Jean-Frédéric
Deltel Charles
Giraud Mathieu
Janot Stéphane
Jourdan Laetitia
Lavenier Dominique
Touzet Helene
Varré Jean-Stéphane
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceBiomanycores is a collection of bioinformatics tools, designed to bridge the gap between researches in OpenCL/CUDA high-performance computing on GPU and other "manycore processors" and usual bioinformaticians and biologists

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

DiNAMO: Exact method for degenerate IUPAC motifs discovery, characterization of sequence-specific errors

Author: Buisine Marie-Pierre
Figeac Martin
Leclerc Julie
Noé Laurent
Richard Hugues
Saad Chadi
Touzet Helene
Publication venue: HAL CCSD
Publication date: 03/07/2017
Field of study

National audienceNext generation sequencing technologies are still associated with relatively high error rates, about 1%, which correspond to thousands of errors in the scale of a complete genome. Each region needs therefore to be sequenced several times and variants are usually filtered based on depth criteria. The significant number of artifacts, in spite of those filters, shows the limit of conventional approaches and indicates that some sequencing artifacts are recurrent. This recurrence underlines that sequencing errors can depend on the upstream nucleotide sequence context. Our goal is to search for overrepresented motifs that tend to induce sequencing errors. Previous studies showed that some motifs, such as GGT [1,2], induce sequencing errors in the Illumina technologies. However, these studies were dedicated to exact motifs, and did not take into account approximate motifs, limiting the statistical power of such approaches. On the other hand, some tools, such as FIRE [3], DREME [4] and Discrover [5], were developed to search for degenerate motifs over the 15-letter IUPAC alphabet in the context of chip-seq studies. However, these tools use greedy algorithms, implying a lack of sensitivity. So we developed an exact algorithm to search for degenerate motifs by enumerating all possible IUPAC motifs. This algorithm is based on mutual information and uses hashtables with graphs data structure to store the motifs. It is independent from the sequencing technology. Experimental results on real data show that there are many overrepresented motifs upstream of sequencing artifacts. These latter are identified through the strand bias between forward and reverse reads. The homopoly-mer of length 3 CCC seems to be sufficient to induce errors on IonTorrent. On Illumina, motifs are mainly composed of GGC followed by GGT (like: TGGCNGGT) or homopolymers. We have also noticed a base quality fall after the detected motifs. Our exact algorithm requires less than one minute (Intel R Core TM i5-4570 CPU, 3.20GHz), and less than 2GB of RAM to search for full degenerate motifs of length 6 on a dataset of approximately 24000 sequences, extracted from 11 exomes sequenced on IonTorrent Proton

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Biomanycores, open-source parallel code for many-core bioinformatics

Author: Berthelot Jean-Frédéric
Deltel Charles
Giraud Mathieu
Janot Stéphane
Jourdan Laetitia
Lavenier Dominique
Touzet Helene
Varré Jean-Stéphane
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

INRIA a CCSD electronic archive server

SortMeRNA: a new software to filter total RNA for metatranscriptomic or RNA analysis

Author: Kopylova Evguenia
Noé Laurent
Touzet Helene
Publication venue: HAL CCSD
Publication date: 03/07/2012
Field of study

National audienc

HAL - Lille 3

INRIA a CCSD electronic archive server

Analysis of tree edit distance algorithms Serge Dulucq

Author: And Helene Touzet
Hélène Touzet
Labri Universit Bordeaux I
Serge Dulucq
Publication venue: Springer-Verlag
Publication date
Field of study

In this article, we study the behaviour of dynamic programming methods for the tree edit distance problem, such as [4] and [2]. We show that those two algorithms may be described in a more general framework of cover strategies. This analysis allows us to define a new tree edit distance algorithm, that is optimal for cover strategies.

CiteSeerX

Self-Overlapping Occurrences and Knuth-Morris-Pratt Algorithm for Weighted Matching

Author: Liefooghe Aude
Touzet Helene
Varré Jean-Stéphane
Publication venue: HAL CCSD
Publication date: 01/04/2009
Field of study

International audiencePosition Weight Matrices are broadly used probabilistic motif models. In this paper, we address the problem of identifying and characterizing potential overlaps between occurrences of such a motif. It has useful applications to the statistics of the number of occurrences, and to weighted pattern matching with an extension of the well-known Knuth-Morris-Pratt algorithm

HAL - Lille 3

INRIA a CCSD electronic archive server

Algorithms with Polynomial Interpretation Termination Proof

Author: Bonfante Guillaume
Cichon Adam
Marion Jean-Yves
Touzet Helene
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2001
Field of study

Article dans revue scientifique avec comité de lecture. internationale.International audienceWe study the effect of polynomial interpretation termination proofs of deterministic (resp. non-deterministic) algorithms defined by confluent (resp. non-confluent) rewrite systems over data structures which include strings, lists and trees, and we classify them according to the interpretations of the constructors. This leads to the definition of six function classes which turn out to be exactly the deterministic (resp. non-deterministic) polynomial time, linear exponential time and linear doubly exponential time computable functions when the class is based on confluent (resp. non-confluent) rewrite systems. We also obtain a characterisation of the linear space computable functions. Finally, we demonstrate that functions with exponential interpretation termination proofs are super-elementary

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server