104 research outputs found

    Compound Poisson Approximation and Testing for Gene Clusters with Multigene Families

    Get PDF
    International audienceWe present in this article a compound Poisson approximation for computing probabilities involved in significance tests for conserved genomic regions between different species. We consider the case when the conserved genomic regions are found by the reference region approach. An important aspect of our computations is the fact that we are taking into account the existence of multigene families. We obtain convergence results for the error of our approximation by using the Stein-Chen method for compound Poisson approximation

    Simultaneous transcriptome profiling of Trypanosoma cruzi parasites and their human host cells.

    Get PDF
    The genome of the kinetoplastid parasite Trypanosoma cruzi, causative agent of Chagas disease, was published nine years ago, yet a systematic and comprehensive analysis of the transcriptomes of the parasite and the human host has not been conducted. The parasite responds rapidly to transmission between arthropod vectors and mammalian hosts by undergoing complex cellular differentiation processes that are not well understood. In this study, we generated the first transcriptome map for both T. cruzi and infected human host cells across the infection cycle including time points of 4, 6, 12, 24, 48 and 72 hours post invasion with the next generation RNA sequencing technology (RNA-Seq). We also captured the transcriptome of the parasite in its bloodstream form (trypomastigote) and its replicative form inside insect vector (epimastigote). We successfully mapped transcribed regions for the pathogen at single nucleotide resolution on a genomic scale and characterized the RNA processing (trans-splicing and polyadenylation) events across its various developmental stages. Here we report the prevalent heterogeneity of RNA processing sites across the genome. We also note the preference of different primary sites in various developmental stages presenting as a potential and interesting approach of posttranscriptional regulation, which may hypothetically contribute to the survival of the parasite across different environments. Our work has significantly enhanced the current genome annotation of T. cruzi. In addition, using the T. cruzi and human genome sequence as reference, we explored these data with informatics tools to identify genes with significant regulation and successfully profiled gene expressions from both species simultaneously. We examined the subsets of differentially expressed genes both in the parasite and the host cell over the course of the infection to understand the mechanisms of invasion and intracellular survival strategy as well as host-pathogen interactions. T. cruzi genes that were significantly regulated during the infection process might present as new targets for drug development, whereas human genes that were significantly regulated might signal the immunoinflammatory response triggered by the manipulation of the parasite. Furthermore, we investigated the gene expression patterns of T. cruzi across its different developmental stages, clustered gene with similar patterns, and identified possible sequence motifs in coexpressed gene clusters

    Biological Systems Workbook: Data modelling and simulations at molecular level

    Get PDF
    Nowadays, there are huge quantities of data surrounding the different fields of biology derived from experiments and theoretical simulations, where results are often stored in biological databases that are growing at a vertiginous rate every year. Therefore, there is an increasing research interest in the application of mathematical and physical models able to produce reliable predictions and explanations to understand and rationalize that information. All these investigations are helping to overcome biological questions pushing forward in the solution of problems faced by our society. In this Biological Systems Workbook, we aim to introduce the basic pieces allowing life to take place, from the 3D structural point of view. We will start learning how to look at the 3D structure of molecules from studying small organic molecules used as drugs. Meanwhile, we will learn some methods that help us to generate models of these structures. Then we will move to more complex natural organic molecules as lipid or carbohydrates, learning how to estimate and reproduce their dynamics. Later, we will revise the structure of more complex macromolecules as proteins or DNA. Along this process, we will refer to different computational tools and databases that will help us to search, analyze and model the different molecular systems studied in this course

    Characterizing freshwater macroinvertebrates of Bangladesh using metagenetic techniques

    Get PDF
    The degradation of freshwater ecosystems has become a global concern, in particular, the critical conditions of rivers in Bangladesh demand a monitoring programme through the assessment of bioindicator organisms. Macroinvertebrates as prominent bioindicators are widely used for assessing the health of aquatic ecosystems. Recent technological advances have enabled routine assessment with the genomic characterization of macroinvertebrates using different metagenetic techniques such as DNA barcoding for individual specimen identification, metabarcoding for multi-species identification of bulk samples and mitochondrial metagenomics for extraction of mitogenomes from mixed samples. In this thesis, I commence by generating Cytochrome Oxidase subunit (COI) barcodes for Bangladeshi freshwater macroinvertebrates belonging to the Ephemeroptera, Plecoptera, Trichoptera, Coleoptera, Hemiptera, Odonata, Diptera, Gastropoda and Bivalvia. These barcodes can be used as a DNA reference library for species identification in metabarcoding of macroinvertebrates. I also aim for exploring complete mitogenomes from selected macroinvertebrates using a mitochondrial metagenomic pipeline. I carry out phylogenetic analysis with protein-coding genes that reveals the evolutionary relationship of Bangladeshi macroinvertebrate lineages and also support deeper level identification of barcodes placing them into the phylogenetic tree (chapter 2). In chapter 3, I assess some methodological aspects of the metabarcoding pipeline required for diversity estimation from complex bulk samples of macroinvertebrates in large-scale biomonitoring programmes. These include preparation of bulk macroinvertebrate samples, optimization of the procedure of homogenization of samples required for DNA extraction, strategies for DNA pooling from these extracts, choice of robust universal primers, and viable OTU clustering for reliable diversity estimation. The results have implications for the optimization and standardization of these steps in metabarcoding of freshwater macroinvertebrates. In chapter 4, I apply the metabarcoding technique to establish the macroinvertebrate diversity and impact of various types of anthropogenic disturbances on the freshwater macroinvertebrates in highland and lowland rivers. The results document high diversity, local endemicity and pronounced responses to disturbance in largely unexplored but threatened habitats of Bangladesh. My investigations manifest the viability of metagenetic techniques for applied conservation management as a step towards building a biomonitoring system in freshwater ecosystems globally.Open Acces

    MODELING HETEROTACHY IN PHYLOGENETICS

    Get PDF
    Il a été démontré que l’hétérotachie, variation du taux de substitutions au cours du temps et entre les sites, est un phénomène fréquent au sein de données réelles. Échouer à modéliser l’hétérotachie peut potentiellement causer des artéfacts phylogénétiques. Actuellement, plusieurs modèles traitent l’hétérotachie : le modèle à mélange des longueurs de branche (MLB) ainsi que diverses formes du modèle covarion. Dans ce projet, notre but est de trouver un modèle qui prenne efficacement en compte les signaux hétérotaches présents dans les données, et ainsi améliorer l’inférence phylogénétique. Pour parvenir à nos fins, deux études ont été réalisées. Dans la première, nous comparons le modèle MLB avec le modèle covarion et le modèle homogène grâce aux test AIC et BIC, ainsi que par validation croisée. A partir de nos résultats, nous pouvons conclure que le modèle MLB n’est pas nécessaire pour les sites dont les longueurs de branche diffèrent sur l’ensemble de l’arbre, car, dans les données réelles, le signaux hétérotaches qui interfèrent avec l’inférence phylogénétique sont généralement concentrés dans une zone limitée de l’arbre. Dans la seconde étude, nous relaxons l’hypothèse que le modèle covarion est homogène entre les sites, et développons un modèle à mélanges basé sur un processus de Dirichlet. Afin d’évaluer différents modèles hétérogènes, nous définissons plusieurs tests de non-conformité par échantillonnage postérieur prédictif pour étudier divers aspects de l’évolution moléculaire à partir de cartographies stochastiques. Ces tests montrent que le modèle à mélanges covarion utilisé avec une loi gamma est capable de refléter adéquatement les variations de substitutions tant à l’intérieur d’un site qu’entre les sites. Notre recherche permet de décrire de façon détaillée l’hétérotachie dans des données réelles et donne des pistes à suivre pour de futurs modèles hétérotaches. Les tests de non conformité par échantillonnage postérieur prédictif fournissent des outils de diagnostic pour évaluer les modèles en détails. De plus, nos deux études révèlent la non spécificité des modèles hétérogènes et, en conséquence, la présence d’interactions entre différents modèles hétérogènes. Nos études suggèrent fortement que les données contiennent différents caractères hétérogènes qui devraient être pris en compte simultanément dans les analyses phylogénétiques.Heterotachy, substitution rate variation across sites and time, has shown to be a frequent phenomenon in the real data. Failure to model heterotachy could potentially cause phylogenetic artefacts. Currently, there are several models to handle heterotachy, the mixture branch length model (MBL) and several variant forms of the covarion model. In this project, our objective is to find a model that efficiently handles heterotachous signals in the data, and thereby improves phylogenetic inference. In order to achieve our goal, two individual studies were conducted. In the first study, we make comparisons among the MBL, covarion and homotachous models using AIC, BIC and cross validation. Based on our results, we conclude that the MBL model, in which sites have different branch lengths along the entire tree, is an over-parameterized model. Real data indicate that the heterotachous signals which interfere with phylogenetic inference are generally limited to a small area of the tree. In the second study, we relax the assumption of the homogeneity of the covarion parameters over sites, and develop a mixture covarion model using a Dirichlet process. In order to evaluate different heterogeneous models, we design several posterior predictive discrepancy tests to study different aspects of molecular evolution using stochastic mappings. The posterior predictive discrepancy tests demonstrate that the covarion mixture +Γ model is able to adequately model the substitution variation within and among sites. Our research permits a detailed view of heterotachy in real datasets and gives directions for future heterotachous models. The posterior predictive discrepancy tests provide diagnostic tools to assess models in detail. Furthermore, both of our studies reveal the non-specificity of heterogeneous models. Our studies strongly suggest that different heterogeneous features in the data should be handled simultaneously

    Development of mathematical methods for modeling biological systems

    Get PDF

    Modélisations de maladies des motoneurones en utilisant le poisson zébré

    Full text link
    Les paraplégies spastiques familiales (PSF) sont un groupe de maladie neurodégénératives hétérogènes affectant les neurones moteurs supérieurs et causant une faiblesse musculaire progressive des membres inférieurs entrainant des problèmes de marche. Plus de 60 gènes ont été lies à cette maladie, leur nombre augmentant régulièrement. La sclérose latérale amyotrophique (SLA) est une maladie neurodégénérative à déclenchement tardif qui affecte les neurones moteurs supérieurs et inférieurs, entrainant une atrophie musculaire accompagnée de spasticité. La mort, causée par une insuffisance respiratoire survient dans les 2 à 5 ans après le début de la maladie. Ces deux maladies de neurones moteurs, bien que différentes, ont des gènes et des mécanismes pathologiques en commun. Ainsi, accroitre notre connaissance de leurs similarités et de leurs différences pourra nous aider à mieux comprendre chacune individuellement. Pour étudier ces deux maladies, nous avons utilisé des modèles de poisson zébré précédemment caractérisés et en avons développé de nouveaux pour approfondir nos connaissances sur les mécanismes physiopathologiques. Dans la première partie de cette thèse, nous avons identifié le stress du réticulum endoplasmique (RE) comme un nouveau mécanisme pathologique induit par la perte de fonction de spastin, un gène impliqué dans la PSF, et avons montré que des modulateurs du stress du RE sont capables de renverser le phénotype locomoteur. Nous avons aussi identifié un nouveau gène causatif de la PSF, CAPN1 (SPG76), et avons validé in vivo la pathogénicité de sa perte de fonction en identifiant une désorganisation des réseaux de microtubules comme phénotype principal. Dans la deuxième partie de cette thèse, nous avons généré plusieurs nouveaux modèles de poisson zébré de la SLA. Deux lignées transgéniques exprimant la protéine humaine de type sauvage ou mutante sous le contrôle d’un promoteur inductible nous ont permis de reproduire des résultats obtenus précédemment par l’injection d’ARNm et d’identifier des changements transcriptomiques similaires à ceux obtenus récemment avec des modèles de souris. Nous ii avons aussi généré deux nouvelles lignées en introduisant des mutations ponctuelles liées à la SLA dans les gènes tardbp et fus du poisson zébré en utilisant la technologie CRISPR/Cas9. Ces résultats soulignent la valeur du poisson zébré comme modèle pour étudier les maladies des neurones moteurs et leurs mécanismes physiopathologiques, et suggèrent de nouvelles approches thérapeutiques.Hereditary spastic paraplegias (HSP) are a group of heterogeneous neurodegenerative diseases affecting upper motor neurons, causing progressive gait dysfunction and more than 60 genes have been linked to this disease. On the other hand, amyotrophic lateral sclerosis (ALS) is a late-onset progressive neurodegenerative disorder that affects both upper and lower motor neurons, leading to muscle atrophy with spasticity and death in two to five years due to respiratory failure. These two motor neuron disorders, while separate, share common genes and pathological mechanisms and as such, increasing our knowledge about their similarities and differences can help us have a better understanding of each of them individually. In order to study these two diseases, we used previously characterized zebrafish models and developed new ones to deepen our understanding of the pathophysiological mechanisms of HSP and ALS. In the first part of this thesis, we identified ER stress as a new pathological mechanism at play in HSP due to spastin loss-of-function and showed that ER stress modulators are able to rescue the locomotor phenotype. We also identified a new gene causative of HSP, CAPN1 (SPG76), provided in vivo validation of its loss-of-function pathogenicity and identified microtubule networks disorganization as one of the main defects. In the second part of this thesis, we generated several new zebrafish models to study ALS. Two transgenic lines expressing either a wild-type or a mutant TDP-43 protein under the control of an inducible promoter allowed us to recapitulate previous findings obtained with mRNA injections and identify transcriptomic changes due to the mutant protein that are in line with recent transcriptomic data obtained in mouse models. We also generated two new lines with knock-in of ALS-causative point mutations in the tardbp and fus zebrafish endogenous genes using the CRISPR/Cas9 technology. These results underscore the value of the zebrafish model to study motor neuron disorders and their pathophysiological mechanisms as well as open new therapeutic avenues

    Molecular ecological characterization of a honey bee ectoparasitic mite, Tropilaelaps mercedesae.

    Get PDF
    Tropilaelaps mercedesae (small mite) is one of two major honey bee ectoparasitic mite species responsible for the colony losses of Apis mellifera in Asia. Although T. mercedesae mites are still restricted in Asia (except Japan), they may diffuse all over the world due to the ever-increasing global trade of live honey bees (ex. Varroa destructor). Understanding the ecological characteristics of T. mercedesae at molecular level could potentially result in improving the management and control programs. However, molecular and genomic characterization of T. mercedesae remains poorly studied, and even no genes have been deposited in Genbank to date. Therefore, I conducted T. mercedesae genome and transcriptome sequencing. By comparing T. mercedesae genome with other arthropods, I have gained new insights into evolution of Parasitiformes and the evolutionary changes associated with specific habitats and life history of honey bee ectoparasitic mite that could potentially improve the control programs of T. mercedesae. Finally, characterization of T. mercedesae transient receptor potential channel, subfamily A, member 1 (TmTRPA1) would also help us to develop a novel control method for T. mercedesae

    Pathogen-Mediated Evolution of Immunogenetic Variation in Plains Zebra (Equus quagga) of Southern Africa

    Get PDF
    Investigating patterns of variability in functional protein-coding genes is fundamental to identifying the basis for population and species adaptation and ultimately, for predicting evolutionary potential in the face of environmental change. The Major Histocompatibility Complex (MHC), a family of immune genes, has been one of the most emphasized gene systems for studying selection and adaptation in vertebrates due to its significance in pathogen recognition and consequently, in eliciting host immune response. Pathogen evasion of host resistance is thought to be the primary mechanism preserving extreme levels of MHC polymorphism and shaping immunogenetic patterns across host populations and species. In this thesis, I examined the evolution of two equine MHC genes, DRA and DQA, over the history of the genus Equus and across free-ranging plains zebra (E. quagga) populations of southern Africa: Etosha National Park (ENP), Namibia and Kruger National Park (KNP), South Africa. Furthermore, I evaluated the relationships between the DRA locus and parasite intensity in E. quagga of ENP, to elucidate the mechanisms by which parasites have shaped diversity at the MHC. In equids, the full extent of diversity and selection on the MHC in wild populations is unknown. Therefore, in this study, I molecularly characterized MHC diversity and selection across equid species to shed light on its mode of evolution in Equus and to identify specific sites under positive selection. Both the DRA and DQA exhibited a high degree of polymorphism and more intriguingly, greater allelic diversity was observed at the DRA than has previously been shown in any other vertebrate taxon. Global selection analyses of both loci indicated that the majority of codon sites are under purifying selection which may be explained by functional constraints on the protein. However, maximum likelihood based codon models of selection, allowing for heterogeneity in selection across codons, suggested that selective pressures varied across sites. Furthermore, at the DQA locus, all sites predicted to be under positive selection were antigen binding sites, implying that a few selected amino acid residues may play a significant role in equid immune function. Observations of trans-species polymorphisms and elevated genetic diversity were concordant with the hypothesis that balancing selection is acting on these genes. Over the past half century, the role of neutral versus selective processes in shaping genetic diversity has been at the center of an ongoing dialogue among evolutionary biologists. To determine the relative influence of demography versus selection on the DRA and DQA loci, I contrasted diversity patterns of neutral and MHC data across the E. quagga populations of ENP and KNP. Neutrality tests, along with observations of elevated diversity and low differentiation across populations relative to nuclear intron data, provided further evidence for balancing selection at these loci among E. quagga populations. However, at the DRA locus, differentiation was comparable to results at microsatellite loci. Furthermore, zebra in ENP exhibited reduced levels of diversity relative to KNP due to a highly skewed allele frequency distribution that could not be explained by demography. These findings were indicative of spatially heterogeneous selection and suggested directional selection and local adaptation at the DRA locus. There still remains a great deal of discussion over the mechanisms by which pathogens preserve immune gene diversity. The leading hypotheses that have been predominantly considered are: (i) heterozygote advantage (i.e. overdominant selection), (ii) rare allele advantage (i.e. frequency-dependent selection), and (iii) spatiotemporally fluctuating selection. An increasing number of studies have investigated MHC-parasite relationships to reconcile this debate, with conflicting results. To elucidate the mechanism driving the population-level patterns of diversity at the DRA locus, I examined relationships between this locus and both gastrointestinal (GI) and ectoparasite intensity in plains zebra of ENP. I discovered antagonistic pleiotropic effects of particular DRA alleles, with rare alleles predicting increased GI parasitism and common alleles associated with higher tick burdens. These results supported a frequency-dependent process and because maladaptive ‘susceptibility alleles’ were found at reduced frequencies, suggested that GI parasites exert strong selective pressure at this locus. Furthermore, heterozygote advantage also played a role in decreasing GI parasite burden, but only when a common allele was paired with a more divergent allele, implying that frequency-dependent and overdominant selection are acting in synchrony. These results indicated that an immunogenetic tradeoff may modulate resistance/susceptibility to parasites in this system, such that with MHC-based resistance to GI parasitism, a fitness cost is incurred to the host in the form of increased ectoparasite susceptibility. It is also suggested that these selective mechanisms are not mutually exclusive. In conclusion, these results provided species and population-level evidence for selection on the equid MHC, and highlighted the complexity in which selection operates in natural systems. In addition to heterogeneity in selective pressures at the molecular-level (across a gene region), selection likely varies spatiotemporally across populations due to fluctuations in pathogen regimes. Furthermore, pleiotropic effects of multiple pathogens can obscure our ability to understand adaptive processes. Given the level of complexity in which selection operates, I emphasize the necessity of incorporating multiple lines of evidence, using both neutral and adaptive data, to illuminate how selection operates. Finally, I also highlight the importance of considering the selective effects of multiple pathogens on host immunogenetics to better understand MHC function and adaptation
    • …
    corecore