10 research outputs found

    Divergent evolution in the genomes of closely related lacertids, <i>Lacerta viridis</i> and <i>L. bilineata</i>, and implications for speciation

    Get PDF
    Lacerta viridis and Lacerta bilineata are sister species of European green lizards (eastern and western clades, respectively) that, until recently, were grouped together as the L. viridis complex. Genetic incompatibilities were observed between lacertid populations through crossing experiments, which led to the delineation of two separate species within the L. viridis complex. The population history of these sister species and processes driving divergence are unknown. We constructed the first high-quality de novo genome assemblies for both L. viridis and L. bilineata through Illumina and PacBio sequencing, with annotation support provided from transcriptome sequencing of several tissues. To estimate gene flow between the two species and identify factors involved in reproductive isolation, we studied their evolutionary history, identified genomic rearrangements, detected signatures of selection on non-coding RNA, and on protein-coding genes.Here we show that gene flow was primarily unidirectional from L. bilineata to L. viridis after their split at least 1.15 million years ago. We detected positive selection of the non-coding repertoire; mutations in transcription factors; accumulation of divergence through inversions; selection on genes involved in neural development, reproduction, and behavior, as well as in ultraviolet-response, possibly driven by sexual selection, whose contribution to reproductive isolation between these lacertid species needs to be further evaluated.The combination of short and long sequence reads resulted in one of the most complete lizard genome assemblies. The characterization of a diverse array of genomic features provided valuable insights into the demographic history of divergence among European green lizards, as well as key species differences, some of which are candidates that could have played a role in speciation. In addition, our study generated valuable genomic resources that can be used to address conservation-related issues in lacertids

    Introducing evolutionary biologists to the analysis of big data: guidelines to organize extended bioinformatics training courses

    Get PDF
    Research in evolutionary biology has been progressively influenced by big data such as massive genome and transcriptome sequencing data, scalar measurements of several phenotypes on tens to thousands of individuals, as well as from collecting worldwide environmental data at an increasingly detailed scale. The handling and analysis of such data require computational skills that usually exceed the abilities of most traditionally trained evolutionary biologists. Here we discuss the advantages, challenges and considerations for organizing and running bioinformatics training courses of 2–3 weeks in length to introduce evolutionary biologists to the computational analysis of big data. Extended courses have the advantage of offering trainees the opportunity to learn a more comprehensive set of complementary topics and skills and allowing for more time to practice newly acquired competences. Many organizational aspects are common to any course, as the need to define precise learning objectives and the selection of appropriate and highly motivated instructors and trainees, among others. However, other features assume particular importance in extended bioinformatics training courses. To successfully implement a learning-by-doing philosophy, sufficient and enthusiastic teaching assistants (TAs) are necessary to offer prompt help to trainees. Further, a good balance between theoretical background and practice time needs to be provided and assured that the schedule includes enough flexibility for extra review sessions or further discussions if desired. A final project enables trainees to apply their newly learned skills to real data or case studies of their interest. To promote a friendly atmosphere throughout the course and to build a close-knit community after the course, allow time for some scientific discussions and social activities. In addition, to not exhaust trainees and TAs, some leisure time needs to be organized. Finally, all organization should be done while keeping the budget within fair limits. In order to create a sustainable course that constantly improves and adapts to the trainees’ needs, gathering short- and long-term feedback after the end of the course is important. Based on our experience we have collected a set of recommendations to effectively organize and run extended bioinformatics training courses for evolutionary biologists, which we here want to share with the community. They offer a complementary way for the practical teaching of modern evolutionary biology and reaching out to the biological community.Peer reviewe

    High quality gene annotation for deep phylogenetic analysis

    Get PDF
    Gene prediction in newly sequenced genomes is a known challenging. Although sophisticated comparative pipelines are available, computationally derived gene models are often less than perfect. This is particularly true when multiple very similar paralogs are present. The issue is aggravated further when genomes are assembled only at a preliminary draft level to contigs or short scaffolds rather than to chromosomes. However, these genomes deliver valuable information for studying gene families. High accuracy models of protein-coding genes are needed in particular for phylogenetics and for the analysis of gene family histories. In this dissertation, I established a tool, the ExonMatchSolver-pipeline (EMS-pipeline), that can assist the assembly of genes distributed across multiple fragments (e.g. contigs). The tool in particular tackles the problem of identifying those coding exon groups that belong to the same paralogous genes in a fragmented genome assembly. The EMS-pipeline accommodates a homology search step with a protein input set consisting of several highly similar paralogs as query. The core of the pipeline uses an Integer Linear Programming Implementation to solve the paralog-to-contig assignment problem. An extension to the initial implementation estimates the number of paralogs encoded in the target genome and can handle several paralogs that are situated on the same genomic fragment. The EMS-pipeline was successfully applied to simulated data, several showcase examples and to deuterostome genomes in a large scale study on the evolution of the arrestin protein family. Especially at high genome fragmentation levels, the tool outperformed a naive assignment method. Arrestins are key signaling transducers that bind to activated and phosphorylated G protein-coupled receptors and can mediate their endocytosis into the cell. The refined annotations of arrestins resulting from the application of the EMS-pipeline are more complete and accurate in comparison to a conventional database search strategy. With the applied strategy it was possible to map the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes in detail. My results support the emergence of the four arrestin paralogs from a visual and a non-visual proto-arrestin. Surprisingly, the visual ARR3 was lost in the mammalian clades afrotherians and xenarthrans. Segmental duplications in specific clades and the 3R-WGD in the teleost stem lineage, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. The four vertebrate orthology groups show an interesting pattern of divergence of three endocytosis motifs: the minor and major clathrin binding site and the adapter protein-2 (AP-2) binding motif. Identification of such signatures, of residues that determine specificity between paralogs and are positively selected after duplication was made possible by high quality alignments obtained by genome inquiries, dense species sampling and consideration of fragmented loci from poorly assembled genomes in the framework of the EMS-pipeline, that was established in this dissertation.:1 Introduction 2 1.1 Basics and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 What is a gene? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 What is a tree in phylogenetics? . . . . . . . . . . . . . . . . . . 3 1.1.3 What are paralogs and orthologs? . . . . . . . . . . . . . . . . . 4 1.1.4 Central dogma in molecular biology: From DNA to protein . . 5 1.2 Gene duplications as evolutionary playground . . . . . . . . . . . . . . 12 1.2.1 Mechanisms of gene duplication . . . . . . . . . . . . . . . . . . 13 1.2.2 Evolutionary fate of duplicated genes . . . . . . . . . . . . . . . 14 1.3 Identification and annotation of protein homologs . . . . . . . . . . . . 15 1.3.1 Challenges of existing resources . . . . . . . . . . . . . . . . . . 16 1.3.2 Similarity search approaches without consideration of the gene structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.3 Gene structure aware gene annotation approaches . . . . . . . . 19 1.3.4 Graph-based inference of orthology relationships . . . . . . . . 21 1.3.5 Chance and challenge of fragmented assemblies . . . . . . . . . 21 1.4 Applied phylogenetic methods . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.1 Phylogenetic inference in a nutshell . . . . . . . . . . . . . . . . 23 1.4.2 Inference of natural selection in inter-species data sets . . . . . 29 1.4.3 Detection of specificity determining positions . . . . . . . . . . 32 1.5 Multi-talents in cell signaling: The cytosolic arrestin proteins . . . . . . 34 1.5.1 Functions of arrestins in cell signaling . . . . . . . . . . . . . . . 34 1.5.2 Arrestin activation by GPCR binding . . . . . . . . . . . . . . . 36 1.5.3 Functions of arrestins in cellular trafficking . . . . . . . . . . . . 37 1.5.4 Evolution of arrestins . . . . . . . . . . . . . . . . . . . . . . . . 39 2 The ExonMatchSolver-pipeline 42 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.1 Pipeline overview . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.2 Exon assembly as an assignment problem . . . . . . . . . . . . . 43 2.2.3 Solving the Paralog-to-Contig Assignment Problem . . . . . . . 46 2.2.4 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.5 Implementation and usage . . . . . . . . . . . . . . . . . . . . . 48 2.2.6 Performance assessment by simulations . . . . . . . . . . . . . . 50 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.1 Performance on simulated data . . . . . . . . . . . . . . . . . . . 50 2.3.2 Performance on real data - Two Showcase Examples . . . . . . . 51 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3 Evolution of the arrestin protein family in deuterostomes 61 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.1 Database scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.2 Detailed gene annotation . . . . . . . . . . . . . . . . . . . . . . 63 3.2.3 Data resources used in the current study . . . . . . . . . . . . . 64 3.2.4 Alignment and building of phylogenetic trees . . . . . . . . . . 64 3.2.5 Identification of specificity determining positions . . . . . . . . 65 3.2.6 Testing for natural selection . . . . . . . . . . . . . . . . . . . . . 66 3.2.7 Assessement of conservation . . . . . . . . . . . . . . . . . . . . 66 3.2.8 Parsimonious reconstruction of exon gain and loss events . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.1 Evolution of the arrestin fold family based on database inquiries 67 3.3.2 The refined arrestin annotations are more complete than database entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.3 Arrestin paralog gain and loss patterns based on the refined annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.4 Evolution of arrestin functional elements . . . . . . . . . . . . . 88 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.4.1 Limitation of arrestin database annotations . . . . . . . . . . . . 96 3.4.2 Arrestins in early vertebrate evolution . . . . . . . . . . . . . . . 98 3.4.3 Sub- and neofunctionalization as consequence of the 3R-WGD . 102 3.4.4 Independent arrestin duplications in deuterostomes . . . . . . . 104 3.4.5 Loss of arrestin paralogs in different vertebrate orders . . . . . 106 3.4.6 Previously unknown interaction partners and isoforms . . . . . 108 4 Improvements on the ExonMatchSolver-pipeline 110 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2.1 Estimation of the paralog number . . . . . . . . . . . . . . . . . 111 4.2.2 Subdivision of gene loci on the same contig . . . . . . . . . . . . 113 4.2.3 Implementation details . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.4 Assessment of the ExonMatchSolver-pipeline Version 2 . . . 115 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5 Conclusion and Outlook 119 A Additional figures 123 B Additional tables 134 C CV 152 Bibliography 15

    The paralog-to-contig assignment problem: High quality gene models from fragmented assemblies

    Get PDF
    Background: The accurate annotation of genes in newly sequenced genomes remains a challenge. Although sophisticated comparative pipelines are available, computationally derived gene models are often less than perfect. This is particularly true when multiple similar paralogs are present. The issue is aggravated further when genomes are assembled only at a preliminary draft level to contigs or short scaffolds. However, these genomes deliver valuable information for studying gene families. High accuracy models of protein coding genes are needed in particular for phylogenetics and for the analysis of gene family histories. Results: We present a pipeline, ExonMatchSolver, that is designed to help the user to produce and curate high quality models of the protein-coding part of genes. The tool in particular tackles the problem of identifying those coding exon groups that belong to the same paralogous genes in a fragmented genome assembly. This paralog-to-contig assignment problem is shown to be NP-complete. It is phrased and solved as an Integer Linear Programming problem. Conclusions: The ExonMatchSolver-pipeline can be employed to build highly accurate models of protein coding genes even when spanning several genomic fragments. This sets the stage for a better understanding of the evolutionary history within particular gene families which possess a large number of paralogs and in which frequent gene duplication events occurred

    Positive Selection in Gene Regulatory Factors Suggests Adaptive Pleiotropic Changes During Human Evolution

    Get PDF
    Gene regulatory factors (GRFs), such as transcription factors, co-factors and histone-modifying enzymes, play many important roles in modifying gene expression in biological processes. They have also been proposed to underlie speciation and adaptation. To investigate potential contributions of GRFs to primate evolution, we analyzed GRF genes in 27 publicly available primate genomes. Genes coding for zinc finger (ZNF) proteins, especially ZNFs with a KrĂŒppel-associated box (KRAB) domain were the most abundant TFs in all genomes. Gene numbers per TF family differed between all species. To detect signs of positive selection in GRF genes we investigated more than 3,000 human GRFs with their more than 70,000 orthologs in 26 non-human primates. We implemented two independent tests for positive selection, the branch-site-model of the PAML suite and aBSREL of the HyPhy suite, focusing on the human and great ape branch. Our workflow included rigorous procedures to reduce the number of false positives: excluding distantly similar orthologs, manual corrections of alignments, and considering only genes and sites detected by both tests for positive selection. Furthermore, we verified the candidate sites for selection by investigating their variation within human and non-human great ape population data. In order to approximately assign a date to positively selected sites in the human lineage, we analyzed archaic human genomes. Our work revealed with high confidence five GRFs that have been positively selected on the human lineage and one GRF that has been positively selected on the great ape lineage. These GRFs are scattered on different chromosomes and have been previously linked to diverse functions. For some of them a role in speciation and/or adaptation can be proposed based on the expression pattern or association with human diseases, but it seems that they all contributed independently to human evolution. Four of the positively selected GRFs are KRAB-ZNF proteins, that induce changes in target genes co-expression and/or through arms race with transposable elements. Since each positively selected GRF contains several sites with evidence for positive selection, we suggest that these GRFs participated pleiotropically to phenotypic adaptations in humans

    Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes

    No full text
    Background: The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. Results: Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. Conclusion: The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.© The Author(s) 201

    Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells

    Get PDF
    The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections

    Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells

    No full text
    The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections
    corecore