8 research outputs found

    The Gapped-Factor Tree

    Get PDF
    International audienceWe present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration

    Browsing repeats in genomes: Pygram and an application to non-coding region analysis

    Get PDF
    BACKGROUND: A large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power. RESULTS: This article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses. CONCLUSION: By proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes

    QuadStack: An Efficient Representation and Direct Rendering of Layered Datasets

    Get PDF
    We introduce QuadStack, a novel algorithm for volumetric data compression and direct rendering. Our algorithm exploits the data redundancy often found in layered datasets which are common in science and engineering fields such as geology, biology, mechanical engineering, medicine, etc. QuadStack first compresses the volumetric data into vertical stacks which are then compressed into a quadtree that identifies and represents the layered structures at the internal nodes. The associated data (color, material, density, etc.) and shape of these layer structures are decoupled and encoded independently, leading to high compression rates (4× to 54× of the original voxel model memory footprint in our experiments). We also introduce an algorithm for value retrieving from the QuadStack representation and we show that the access has logarithmic complexity. Because of the fast access, QuadStack is suitable for efficient data representation and direct rendering. We show that our GPU implementation performs comparably in speed with the state-of-the-art algorithms (18-79 MRays/s in our implementation), while maintaining a significantly smaller memory footprint

    Algorithms for the analysis of molecular sequences

    Get PDF

    Lire les lectures : analyse de données de séquençage

    Get PDF
    Tous les travaux prĂ©sentĂ©s dans cette HDR concernent l’exploitation de donnĂ©es de sĂ©quençage haut dĂ©bit en absence de gĂ©nome de rĂ©fĂ©rence proche et de bonne qualitĂ©.Dans un premier chapitre, nous proposons de nouvelles approches pour extraire des variants biologiques d’intĂ©rĂȘt de ces donnĂ©es de sĂ©quençage. Dans un second chapitre nous exposons des mĂ©thodes de comparaisons de jeux de donnĂ©es de sĂ©quençage. Enfin, dans un troisiĂšme chapitre, nous proposons une mĂ©thode prĂ©liminaire Ă  de meilleurs « assemblages » de ces donnĂ©es de sĂ©quençage

    A first approach to finding common motifs with gaps,

    No full text
    International audienc
    corecore