21 research outputs found

    The Fixed Landscape Inference MethOd (flimo): a versatile alternative to Approximate Bayesian Computation, faster by several orders of magnitude

    Full text link
    Modelling in biology must adapt to increasingly complex and massive data. The efficiency of the inference algorithms used to estimate model parameters is therefore questioned. Many of these are based on stochastic optimization processes that require significant computing time. We introduce the Fixed Landscape Inference MethOd (flimo), a new likelihood-free inference method for continuous state-space stochastic models. It applies deterministic gradient-based optimization algorithms to obtain a point estimate of the parameters, minimizing the difference between the data and some simulations according to some prescribed summary statistics. In this sense, it is analogous to Approximate Bayesian Computation (ABC). Like ABC, it can also provide an approximation of the distribution of the parameters. Three applications are proposed: a usual theoretical example, namely the inference of the parameters of g-and-k distributions; a population genetics problem, not so simple as it seems, namely the inference of a selective value from time series in a Wright-Fisher model; and simulations from a Ricker model, representing chaotic population dynamics. In the two first applications, the results show a drastic reduction of the computational time needed for the inference phase compared to the other methods, despite an equivalent accuracy. Even when likelihood-based methods are applicable, the simplicity and efficiency of flimo make it a compelling alternative. Implementations in Julia and in R are available on https://metabarcoding.org/flimo. To run flimo, the user must simply be able to simulate data according to the chosen model

    Transcriptome response to pollutants and insecticides in the dengue vector Aedes aegypti using next-generation sequencing technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The control of mosquitoes transmitting infectious diseases relies mainly on the use of chemical insecticides. However, mosquito control programs are now threatened by the emergence of insecticide resistance. Hitherto, most research efforts have been focused on elucidating the molecular basis of inherited resistance. Less attention has been paid to the short-term response of mosquitoes to insecticides and pollutants which could have a significant impact on insecticide efficacy. Here, a combination of LongSAGE and Solexa sequencing was used to perform a deep transcriptome analysis of larvae of the dengue vector <it>Aedes aegypti </it>exposed for 48 h to sub-lethal doses of three chemical insecticides and three anthropogenic pollutants.</p> <p>Results</p> <p>Thirty millions 20 bp cDNA tags were sequenced, mapped to the mosquito genome and clustered, representing 6850 known genes and 4868 additional clusters not located within predicted genes. Mosquitoes exposed to insecticides or anthropogenic pollutants showed considerable modifications of their transcriptome. Genes encoding cuticular proteins, transporters, and enzymes involved in the mitochondrial respiratory chain and detoxification processes were particularly affected. Genes and molecular mechanisms potentially involved in xenobiotic response and insecticide tolerance were identified.</p> <p>Conclusions</p> <p>The method used in the present study appears as a powerful approach for investigating fine transcriptome variations in genome-sequenced organisms and can provide useful informations for the detection of novel transcripts. At the biological level, despite low concentrations and no apparent phenotypic effects, the significant impact of these xenobiotics on mosquito transcriptomes raise important questions about the 'hidden impact' of anthropogenic pollutants on ecosystems and consequences on vector control.</p

    The GC-heterogeneity of teleost fishes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the most striking features of mammalian and birds chromosomes is the variation in the guanine-cytosine (GC) content that occurs over scales of hundreds of kilobases to megabases; this is known as the "isochore" structure. Among other vertebrates the presence of isochores depends upon the taxon; isochore are clearly present in Crocodiles and turtles but fish genome seems very homogeneous on GC content. This has suggested a unique isochore origin after the divergence between Sarcopterygii and Actinopterygii, but before that between Sauropsida and mammals. However during more than 30 years of analysis, isochore characteristics have been studied and many important biological properties have been associated with the isochore structure of human genomes. For instance, the genes are more compact and their density is highest in <it>GC </it>rich isochores.</p> <p>Results</p> <p>This paper shows in teleost fish genomes the existence of "GC segmentation" sharing some of the characteristics of isochores although teleost fish genomes presenting a particular homogeneity in CG content. The entire genomes of <it>T nigroviridis </it>and <it>D rerio </it>are now available, and this has made it possible to check whether a mosaic structure associated with isochore properties can be found in these fishes. In this study, hidden Markov models were trained on fish genes (<it>T nigroviridis </it>and <it>D rerio</it>) which were classified by using the isochore class of their human orthologous. A clear segmentation of these genomes was detected.</p> <p>Conclusion</p> <p>The GC content is an excellent indicator of isochores in heterogeneous genomes as mammals. The segmentation we obtained were well correlated with GC content and other properties associated to GC content such as gene density, the number of exons per gene and the length of introns. Therefore, the GC content is the main property that allows the detection of isochore but more biological properties have to be taken into account. This method allows detecting isochores in homogeneous genomes.</p

    Development of an Arabis alpina genomic contig sequence data set and application to single nucleotide polymorphisms discovery

    No full text
    International audienceThe alpine plant Arabis alpina is an emerging model in the ecological genomic field which is well suited to identifying the genes involved in local adaptation in contrasted environmental conditions, a subject which remains poorly understood at molecular level. This study presents the assembly of a pool of A. alpina genomic fragments using next-generation sequencing technologies. These contigs cover 172 Mb of the A. alpina genome (i.e. 50% of the genome) and were shown to contain sequences giving positive hits against 96% of the 458 CEGMA core genes (Core Eukaryotic Genes Mapping Approach), a set of highly conserved eukaryotic genes. Regions presenting high nucleic sequence identity with 77% of the close relative Arabidopsis thaliana's genes were found with an unbiased distribution across the different functional categories of A. thaliana genes. This new resource was tested using a resequencing assay to identify polymorphic sites. Sixteen samples were successfully analysed and 127 041 single-nucleotide polymorphisms identified. This contig data set will contribute to improving our understanding of the ecology of Arabis alpina, thus constituting an important resource for future ecological genomic studies

    Modelling the length distribution of exons by sums of geometric laws. Analysis of the structure of genes and G+C influence

    No full text
    Abstract. Mathematical and computational methods are essential for gene identification and a more realistic modelling is necessary to better understand genome organization and gene expression. Hidden Markov models are one of the methods widely used for such identification. These models are quite efficient for gene localization but they imply that the lengths of all regions are geometrically distributed. However, in the human genome, the length distribution of the exons does not follow a geometric law. To address this problem, we propose to represent the length distribution of the exons by sums of geometric distributions with equal or different parameters. The model that we obtain has relatively few parameters, and fits very well exon lengths. Moreover, we propose a data processing method, based on a discrimination technique between Hidden Markov Models, which allows to study the structure of coding genes in detail. Our model describes known differences in gene organization between isochore classes and reveals some specific characteristics of intronless genes and a break in the homogeneity of the first coding exons. The use of hidden Markov models with complex states seems therefore to be a promising new approach for the modelling of the organization of a large genome

    Analysis of the structure of genes using hidden Markov models

    No full text
    Background To perform genome analysis, semi-Markov models were widely developed and are efficient to detect protein genes. The trade off is a strong increase of the complexity of most algorithms implied by the estimation and the use of these models. Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. Numerous methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. Results We show that hidden Markov models can be used to analyse complex genes structures with bell-shaped distributed lengths, modelling them by sums of geometric laws. Thus, macros-states model the distributions of the lengths of the regions. Variou

    A Markovian Approach to the Analysis of the Structure of Genes

    No full text
    The sequencing of the complete human genome yields the knowledge of a sequence of three billion pairs of nucleotides. We propose a data processing method, based on a discrimination between different hidden Markov models (HMMs), and which allows to study the coding gene structure in details. We show that the exon length distribution is well represented by sums of geometric distributions with equal or different parameters. This approach allows to avoid the use of semi-Markov models. Moreover, HMMs reveal a break in the homogeneity of the structure of the first coding exons, around position 80, while internal and terminal exons are homogeneous. An explanation of this difference could be the presence of a signal peptide at the beginning of some initial exons. We plan to apply this approach to longer genomic regions, a preliminary global analysis of the sequences being completed by subsequent studies at finer scales. We propose here the use of HMMs as a data analysis approach to reveal new structural properties of genomes
    corecore