28 research outputs found

    Comparative genomics of the major parasitic worms

    Get PDF
    Parasitic nematodes (roundworms) and platyhelminths (flatworms) cause debilitating chronic infections of humans and animals, decimate crop production and are a major impediment to socioeconomic development. Here we report a broad comparative study of 81 genomes of parasitic and non-parasitic worms. We have identified gene family births and hundreds of expanded gene families at key nodes in the phylogeny that are relevant to parasitism. Examples include gene families that modulate host immune responses, enable parasite migration though host tissues or allow the parasite to feed. We reveal extensive lineage-specific differences in core metabolism and protein families historically targeted for drug development. From an in silico screen, we have identified and prioritized new potential drug targets and compounds for testing. This comparative genomics resource provides a much-needed boost for the research community to understand and combat parasitic worms

    Virtual genome walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence

    Get PDF
    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n. The software pipeline is available from https://github.com/LooseLab/iterassemble

    Near-future CO2 levels impair the olfactory system of a marine fish

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer Nature via the DOI in this recordData availability: All raw sequence data are accessible at the NCBI Sequence Read Archive through accession number SRP097118. Water chemistry, behaviour and electrophysiology data are available through Pangaea (https://doi.pangaea.de/10.1594/PANGAEA.884674).Survival of marine fishes that are exposed to elevated near-future CO2levels is threatened by their altered responses to sensory cues. Here we demonstrate a physiological and molecular mechanism in the olfactory system that helps to explain altered behaviour under elevated CO2. We combine electrophysiology measurements and transcriptomics with behavioural experiments to investigate how elevated CO2affects the olfactory system of European sea bass (Dicentrarchus labrax). When exposed to elevated CO2(approximately 1,000 µatm), fish must be up to 42% closer to an odour source for detection, compared with current CO2levels (around 400 µatm), decreasing their chances of detecting food or predators. Compromised olfaction correlated with the suppression of the transcription of genes involved in synaptic strength, cell excitability and wiring of the olfactory system in response to sustained exposure to elevated CO2levels. Our findings complement the previously proposed impairment of γ-aminobutyric acid receptors, and indicate that both the olfactory system and central brain function are compromised by elevated CO2levels.This study was supported by grants from Association of European Marine Biology Laboratories (227799), the Natural Environment Research Council (R.W.W.; NE/H017402/1), the Biotechnology and Biological Sciences Research Council (R.W.W.; BB/D005108/1), Fundação para a Ciência e Tecnologia (Portuguese Science Ministry) (UID/Multi/04326/2013) and a Royal Society Newton International Fellowship to C.S.P. C.S.P. is also a beneficiary of a Starting Grant from AXA

    Genomes of cryptic chimpanzee Plasmodium species reveal key evolutionary events leading to human malaria

    Get PDF
    African apes harbour at least six Plasmodium species of the subgenus Laverania, one of which gave rise to human Plasmodium falciparum. Here we use a selective amplification strategy to sequence the genome of chimpanzee parasites classified as Plasmodium reichenowi and Plasmodium gaboni based on the subgenomic fragments. Genome-wide analyses show that these parasites indeed represent distinct species, with no evidence of cross-species mating. Both P. reichenowi and P. gaboni are 10-fold more diverse than P. falciparum, indicating a very recent origin of the human parasite. We also find a remarkable Laverania-specific expansion of a multigene family involved in erythrocyte remodelling, and show that a short region on chromosome 4, which encodes two essential invasion genes, was horizontally transferred into a recent P. falciparum ancestor. Our results validate the selective amplification strategy for characterizing cryptic pathogen species, and reveal evolutionary events that likely predisposed the precursor of P. falciparum to colonize humans

    Scalable workflows and reproducible data analysis for genomics

    No full text
    Biological, clinical, and pharmacological research now often involves analyses of genomes, transcriptomes, proteomes, and interactomes, within and between individuals and across species. Due to large volumes, the analysis and integration of data generated by such high-throughput technologies have become computationally intensive, and analysis can no longer happen on a typical desktop computer. In this chapter we show how to describe and execute the same analysis using a number of workflow systems and how these follow different approaches to tackle execution and reproducibility issues. We show how any researcher can create a reusable and reproducible bioinformatics pipeline that can be deployed and run anywhere. We show how to create a scalable, reusable, and shareable workflow using four different workflow engines: the Common Workflow Language (CWL), Guix Workflow Language (GWL), Snakemake, and Nextflow. Each of which can be run in parallel. We show how to bundle a number of tools used in evolutionary biology by using Debian, GNU Guix, and Bioconda software distributions, along with the use of container systems, such as Docker, GNU Guix, and Singularity. Together these distributions represent the overall majority of software packages relevant for biology, including PAML, Muscle, MAFFT, MrBayes, and BLAST. By bundling software in lightweight containers, they can be deployed on a desktop, in the cloud, and, increasingly, on compute clusters. By bundling software through these public software distributions, and by creating reproducible and shareable pipelines using these workflow engines, not only do bioinformaticians have to spend less time reinventing the wheel but also do we get closer to the ideal of making science reproducible. The examples in this chapter allow a quick comparison of different solutions

    Perspectives on automated composition of workflows in the life sciences

    Get PDF
    Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future
    corecore