85 research outputs found

    On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

    Get PDF
    A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character

    FactSage thermochemical software and databases, 2010–2016

    Get PDF
    The FactSage computer package consists of a series of information, calculation and manipulation modules that enable one to access and manipulate compound and solution databases. With the various modules running under Microsoft Windows® one can perform a wide variety of thermochemical calculations and generate tables, graphs and figures of interest to chemical and physical metallurgists, chemical engineers, corrosion engineers, inorganic chemists, geochemists, ceramists, electrochemists, environmentalists, etc. This paper presents a summary of the developments in the FactSage thermochemical software and databases during the last six years. Particular emphasis is placed on the new databases and developments in calculating and manipulating phase diagrams

    SIC: a tool to detect short inverted segments in a biological sequence

    No full text

    Composition corporelle et caractéristiques biologiques des muscles chez les bovins en croissance et à l’engrais

    No full text
    National audienceThis paper gives an overview of what the various teams have learnt concerning the biological laws for variation in body composition of cattle and the effects of zootechnical factors influencing this (breed, sex, slaughter weight, feeding level, type of feed and growth factors). The biological characteristics of muscle tissue, which govern meat quality, and the variations thereof under the influence of the same factors are also addressed.Cet article présente de façon synthétique les connaissances acquises par différentes équipes de recherches sur les lois biologiques de variations de la composition corporelle des bovins et les effets des facteurs zootechniques qui permettent de la modifier (génotype, sexe, stade d’abattage, niveau d’alimentation, nature de la ration et facteurs de croissance). Les caractéristiques du tissu musculaire, déterminantes pour la qualité de la viande, et leurs variations sous l’influence des mêmes facteurs sont également développées
    • …