Search CORE

8 research outputs found

The Gapped-Factor Tree

Author: Allali Julien
Peterlongo Pierre
Sagot Marie-France
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Browsing repeats in genomes: Pygram and an application to non-coding region analysis

Author: Durand Patrick
Mahé Frédéric
Nicolas Jacques
Valin Anne-Sophie
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power. RESULTS: This article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses. CONCLUSION: By proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes

HAL-CentraleSupelec

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

HAL-INSU

HAL-Rennes 1

QuadStack: An Efficient Representation and Direct Rendering of Layered Datasets

Author: Benes Bedrich
Bittner Jirí
Graciano Alejandro
Pospísil Adam
Rueda Antonio J.
Publication venue: IEEE
Publication date: 01/09/2021
Field of study

We introduce QuadStack, a novel algorithm for volumetric data compression and direct rendering. Our algorithm exploits the data redundancy often found in layered datasets which are common in science and engineering fields such as geology, biology, mechanical engineering, medicine, etc. QuadStack first compresses the volumetric data into vertical stacks which are then compressed into a quadtree that identifies and represents the layered structures at the internal nodes. The associated data (color, material, density, etc.) and shape of these layer structures are decoupled and encoded independently, leading to high compression rates (4× to 54× of the original voxel model memory footprint in our experiments). We also introduce an algorithm for value retrieving from the QuadStack representation and we show that the access has logarithmic complexity. Because of the fast access, QuadStack is suitable for efficient data representation and direct rendering. We show that our GPU implementation performs comparably in speed with the state-of-the-art algorithms (18-79 MRays/s in our implementation), while maintaining a significantly smaller memory footprint

RUJA (Repositorio Institucional de la Universidad de Jaén)

Algorithms for the analysis of molecular sequences

Author: Vayani Fatima
Publication venue
Publication date: 01/12/2019
Field of study

King's Research Portal

Lire les lectures : analyse de données de séquençage

Author: Peterlongo Pierre
Publication venue: HAL CCSD
Publication date: 25/01/2016
Field of study

Tous les travaux présentés dans cette HDR concernent l’exploitation de données de séquençage haut débit en absence de génome de référence proche et de bonne qualité.Dans un premier chapitre, nous proposons de nouvelles approches pour extraire des variants biologiques d’intérêt de ces données de séquençage. Dans un second chapitre nous exposons des méthodes de comparaisons de jeux de données de séquençage. Enfin, dans un troisième chapitre, nous proposons une méthode préliminaire à de meilleurs « assemblages » de ces données de séquençage

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1