25 research outputs found

    Detecting Introgression in Anopheles Mosquito Genomes using a Reconciliation-Based Approach

    Get PDF
    Introgression is an important evolutionary mechanism in insects and animals evolution. Current methods for detecting introgression rely on the analysis of phylogenetic incongruence, using either statistical tests based on expected phylogenetic patterns in small phylogenies or probabilistic modeling in a phylogenetic network context. Introgression leaves a phylogenetic signal similar to horizontal gene transfer, and it has been suggested that its detection can also be approached through the gene tree / species tree reconciliation framework, which accounts jointly for other evolutionary mechanisms such as gene duplication and gene loss. However so far the use of a reconciliation-based approach to detect introgression has not been investigated in large datasets. In this work, we apply this principle to a large dataset of Anopheles mosquito genomes. Our reconciliation-based approach recovers the extensive introgression that occurs in the gambiae complex, although with some variations compared to previous reports. Our analysis also suggests a possible ancient introgression event involving the ancestor of An. christyi

    Analyse bioinformatique des événements de transferts horizontaux entre espèces de drosophiles et lien avec la régulation des éléments transposables

    Get PDF
    Transposable elements (TEs) are repeated DNA sequences that are able to move (transpose) within their host genome. To counteract the negative effects of their TEs, regulation mechanisms of the TE transposition are present in the host genome. Once a TE is regulated, the progressive accumulation of mutations in its sequence will inevitably lead to the definitive loss of its transposition capacity. My work during this thesis is was to better understand the succss and the maintaining of these peculiar repeated sequencest, with the study of horizontal transfers (HTs) of TEs enabling them to escape host regulation mechanisms, and the study of this regulation. The first part of my thesis concerns the study of HTs between two closely related drosophila species. I have developed a new bioinformatic method for the detection of HTs between two eukaryotic genomes. The development of this method brought me to work on the unilateral multiple testing problematic for which I have developed a new procedure to control the expected false discovery rate (FDR). The second part of my thesis focuses on the regulation of TEs by the small RNA pathway, an RNA interference mechanism. For this study, I have analyzed sequencing data of small RNAs and total RNAs. For this work, I have developed an analysis pipeline, to study differences of expression between repeated sequences. Some features of the small RNA dataset required the development of a new procedure to parse them. This procedure was extended and implemented in a software to be used for the quality control of next generation sequencing dataLes éléments transposable (ET) sont des séquences d'ADN qui ont la capacité de se déplacer au sein des génomes. Pour contrebalancer les effets négatifs liés à l'activité des ET, il existe chez leurs hôtes des mécanismes régulant l'activité de transposition. Une fois qu'un ET est régulé, l'accumulation progressive de mutations dans sa séquence conduit fatalement à la perte définitive de son activité de transposition. J'ai cherché au cours de cette thèse à mieux comprendre le succès et le maintien de ces séquences répétées, avec d'une part l'étude des transferts horizontaux (TH) d'ET, un moyen d'échapper aux mécanismes de régulation , et d'autre part l'étude de leur régulation. Dans la première partie de ma thèse, je me suis intéressé à l'étude des TH entre deux espèces proches de drosophiles. Dans cette étude, j'ai développé une nouvelle méthode bioinformatique permettant la détection de séquences transférées horizontalement entre deux génomes eucaryotes qui m'a permis détecter de nombreux TH d'ET. Ce travail m'a aussi conduit à développé une nouvelle méthode de contrôle du taux de faux positifs moyen applicable aux tests multiples unilatéraux. Dans la deuxième partie de ma thèse, j'ai étudié la régulation des ET par la voie des petits ARN, un mécanisme de l'ARN interférence. Dans cette étude, j'ai analysé des données de séquençage de petits ARN, ainsi que d'ARN totaux issues de différentes populations de D. simulans. Ce travail a conduit au développement d'un pipeline d'analyse permettant d'étudier des différences d'expression entre des séquences répétées ainsi que d'une nouvelle procédure de contrôle qualité de ce type de donné

    Analysis of anopheline mosquito behavior and identification of vector control targets in the post-genomic era

    Get PDF
    Thesis advisor: Marc A.T. MuskavitchThe protozoan Plasmodium falciparum, the mosquito-borne pathogen that causes human malaria, remains one of the most difficult infectious parasites to combat and control. Campaigns against malaria eradication have succeeded, in most instances, at the level of vector control, rather than from initiatives that have attempted to decrease malaria burden by targeting parasites. The rapid evolution and spread of insecticide-resistant mosquitoes is threatening our ability to combat vectors and control malaria. Therefore, the development, procurement and distribution of new methods of vector control are paramount. Two aspects of vector biology that can be exploited toward these ends are vector behaviors and vector-specific insecticide targets. In this thesis, I describe three aspects of vector biology with potential for the development of improved means of vector control: photopreference behavior, long non-coding RNA (lncRNA) targets and epigenetic gene ensemble targets. My studies of photopreference have revealed that specific mosquito species within the genus Anopheles, An. gambiae and An. stephensi, exhibit different photopreference behaviors, and that each gender of mosquito in these species exhibits distinct light-dependent resting behaviors. These inter-specific behavioral differences may be affected by differing numbers of long-wavelength sensing Opsin genes in each species, and my findings regarding species-specific photopreferences suggest that some behavioral interventions may need to be tailored for specific vector mosquito species. Based on the advancement of next-generation sequencing technologies and the generation by others of assembled genomes of many anopheline mosquito species, I have identified a comprehensive set of approximately 3,000 lncRNAs and find that RNA secondary structures are notably conserved within the gambiae species complex. As lncRNAs and epigenetic modifiers cooperate to modulate epigenetic regulation, I have also analyzed the conservation of epigenetic gene ensembles across a number of anopheline species, based on identification of homologous epigenetic ensemble genes in An. gambiae compared to Drosophila melanogaster. Further analyses of these ensembles illustrate that these epigenetic genes are highly stable among many anopheline species, in that I detect only eight gene family expansion or contraction events among 169 epigenetic ensemble genes within a set of 12 anopheline species. My hope is that my findings will enable deeper investigations of many behavioral and epigenetic processes in Anopheles gambiae and other anopheline vector mosquitoes and thereby enable the development of new, more effective means of vector and malaria control.Thesis (PhD) — Boston College, 2015.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Genetic variation within the Daphnia pulex genome

    Get PDF
    Genetic variation within the diploid Daphnia pulex genome was examined using a high quality de novo assembly and shotgun reads from two distinct D. pulex clones. Patterns of variation and divergence at single nucleotides were examined in physical and functional regions of the genome using comparative assembly output and available annotations. Additionally, mitochondrial genomes of the same D. pulex clones were assembled and compared for patterns of divergence, and substitutional biases. Intron presence/absence polymorphisms were identified computationally and verified experimentally. Finally, gene duplicate demographics were examined for patterns of divergence and estimates of gene birth rates

    Phylogenetics in the Genomic Era

    Get PDF
    Molecular phylogenetics was born in the middle of the 20th century, when the advent of protein and DNA sequencing offered a novel way to study the evolutionary relationships between living organisms. The first 50 years of the discipline can be seen as a long quest for resolving power. The goal – reconstructing the tree of life – seemed to be unreachable, the methods were heavily debated, and the data limiting. Maybe for these reasons, even the relevance of the whole approach was repeatedly questioned, as part of the so-called molecules versus morphology debate. Controversies often crystalized around long-standing conundrums, such as the origin of land plants, the diversification of placental mammals, or the prokaryote/eukaryote divide. Some of these questions were resolved as gene and species samples increased in size. Over the years, molecular phylogenetics has gradually evolved from a brilliant, revolutionary idea to a mature research field centred on the problem of reliably building trees. This logical progression was abruptly interrupted in the late 2000s. High-throughput sequencing arose and the field suddenly moved into something entirely different. Access to genome-scale data profoundly reshaped the methodological challenges, while opening an amazing range of new application perspectives. Phylogenetics left the realm of systematics to occupy a central place in one of the most exciting research fields of this century – genomics. This is what this book is about: how we do trees, and what we do with trees, in the current phylogenomic era. One obvious, practical consequence of the transition to genome-scale data is that the most widely used tree-building methods, which are based on probabilistic models of sequence evolution, require intensive algorithmic optimization to be applicable to current datasets. This problem is considered in Part 1 of the book, which includes a general introduction to Markov models (Chapter 1.1) and a detailed description of how to optimally design and implement Maximum Likelihood (Chapter 1.2) and Bayesian (Chapter 1.4) phylogenetic inference methods. The importance of the computational aspects of modern phylogenomics is such that efficient software development is a major activity of numerous research groups in the field. We acknowledge this and have included seven "How to" chapters presenting recent updates of major phylogenomic tools – RAxML (Chapter 1.3), PhyloBayes (Chapter 1.5), MACSE (Chapter 2.3), Bgee (Chapter 4.3), RevBayes (Chapter 5.2), Beagle (Chapter 5.4), and BPP (Chapter 5.6). Genome-scale data sets are so large that statistical power, which had been the main limiting factor of phylogenetic inference during previous decades, is no longer a major issue. Massive data sets instead tend to amplify the signal they deliver – be it biological or artefactual – so that bias and inconsistency, instead of sampling variance, are the main problems with phylogenetic inference in the genomic era. Part 2 covers the issues of data quality and model adequacy in phylogenomics. Chapter 2.1 provides an overview of current practice and makes recommendations on how to avoid the more common biases. Two chapters review the challenges and limitations of two key steps of phylogenomic analysis pipelines, sequence alignment (Chapter 2.2) and orthology prediction (Chapter 2.4), which largely determine the reliability of downstream inferences. The performance of tree building methods is also the subject of Chapter 2.5, in which a new approach is introduced to assess the quality of gene trees based on their ability to correctly predict ancestral gene order. Analyses of multiple genes typically recover multiple, distinct trees. Maybe the biggest conceptual advance induced by the phylogenetic to phylogenomic transition is the suggestion that one should not simply aim to reconstruct “the” species tree, but rather to be prepared to make sense of forests of gene trees. Chapter 3.1 reviews the numerous reasons why gene trees can differ from each other and from the species tree, and what the implications are for phylogenetic inference. Chapter 3.2 focuses on gene trees/species trees reconciliation methods that account for gene duplication/loss and horizontal gene transfer among lineages. Incomplete lineage sorting is another major source of phylogenetic incongruence among loci, which recently gained attention and is covered by Chapter 3.3. Chapter 3.4 concludes this part by taking a user’s perspective and examining the pros and cons of concatenation versus separate analysis of gene sequence alignments. Modern genomics is comparative and phylogenetic methods are key to a wide range of questions and analyses relevant to the study of molecular evolution. This is covered by Part 4. We argue that genome annotation, either structural or functional, can only be properly achieved in a phylogenetic context. Chapters 4.1 and 4.2 review the power of these approaches and their connections with the study of gene function. Molecular substitution rates play a key role in our understanding of the prevalence of nearly neutral versus adaptive molecular evolution, and the influence of species traits on genome dynamics (Chapter 4.4). The analysis of substitution rates, and particularly the detection of positive selection, requires sophisticated methods and models of coding sequence evolution (Chapter 4.5). Phylogenomics also offers a unique opportunity to explore evolutionary convergence at a molecular level, thus addressing the long-standing question of predictability versus contingency in evolution (Chapter 4.6). The development of phylogenomics, as reviewed in Parts 1 through 4, has resulted in a powerful conceptual and methodological corpus, which is often reused for addressing problems of interest to biologists from other fields. Part 5 illustrates this application potential via three selected examples. Chapter 5.1 addresses the link between phylogenomics and palaeontology; i.e., how to optimally combine molecular and fossil data for estimating divergence times. Chapter 5.3 emphasizes the importance of the phylogenomic approach in virology and its potential to trace the origin and spread of infectious diseases in space and time. Finally, Chapter 5.5 recalls why phylogenomic methods and the multi-species coalescent model are key in addressing the problem of species delimitation – one of the major goals of taxonomy. It is hard to predict where phylogenomics as a discipline will stand in even 10 years. Maybe a novel technological revolution will bring it to yet another level? We strongly believe, however, that tree thinking will remain pivotal in the treatment and interpretation of the deluge of genomic data to come. Perhaps a prefiguration of the future of our field is provided by the daily monitoring of the current Covid-19 outbreak via the phylogenetic analysis of coronavirus genomic data in quasi real time – a topic of major societal importance, contemporary to the publication of this book, in which phylogenomics is instrumental in helping to fight disease

    The role of visual adaptation in cichlid fish speciation

    Get PDF
    D. Shane Wright (1) , Ole Seehausen (2), Ton G.G. Groothuis (1), Martine E. Maan (1) (1) University of Groningen; GELIFES; EGDB(2) Department of Fish Ecology & Evolution, EAWAG Centre for Ecology, Evolution and Biogeochemistry, Kastanienbaum AND Institute of Ecology and Evolution, Aquatic Ecology, University of Bern.In less than 15,000 years, Lake Victoria cichlid fishes have radiated into as many as 500 different species. Ecological and sexual sel ection are thought to contribute to this ongoing speciation process, but genetic differentiation remains low. However, recent work in visual pigment genes, opsins, has shown more diversity. Unlike neighboring Lakes Malawi and Tanganyika, Lake Victoria is highly turbid, resulting in a long wavelength shift in the light spectrum with increasing depth, providing an environmental gradient for exploring divergent coevolution in sensory systems and colour signals via sensory drive. Pundamilia pundamila and Pundamilia nyererei are two sympatric species found at rocky islands across southern portions of Lake Victoria, differing in male colouration and the depth they reside. Previous work has shown species differentiation in colour discrimination, corresponding to divergent female preferences for conspecific male colouration. A mechanistic link between colour vision and preference would provide a rapid route to reproductive isolation between divergently adapting populations. This link is tested by experimental manip ulation of colour vision - raising both species and their hybrids under light conditions mimicking shallow and deep habitats. We quantify the expression of retinal opsins and test behaviours important for speciation: mate choice, habitat preference, and fo raging performance

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
    corecore