20 research outputs found

    A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes

    Get PDF
    Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite's biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds.Ā  Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres.Ā  An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality

    Companion: a web server for annotation and analysis of parasite genomes

    Get PDF
    Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references

    An expressed, endogenous nodavirus-like element captured by a retrotransposon in the genome of the plant parasitic nematode Bursaphelenchus xylophilus

    Get PDF
    Recently, nematode viruses infecting Caenorhabditis elegans have been reported from the family Nodaviridae, the first nematode viruses described. Here, we report the observation of a novel endogenous viral element (EVE) in the genome of Bursaphelenchus xylophilus, a plant parasitic nematode unrelated to other nematodes from which viruses have been characterised. This element derives from a different clade of nodaviruses to the previously reported nematode viruses. This represents the first endogenous nodavirus sequence, the first nematode endogenous viral element, and significantly extends our knowledge of the potential diversity of the Nodaviridae. A search for endogenous elements related to the Nodaviridae did not reveal any elements in other available nematode genomes. Further surveillance for endogenous viral elements is warranted as our knowledge of nematode genome diversity, and in particular of free-living nematodes, expands

    Fine-grained annotation and classification of de novo predicted LTR retrotransposons

    Get PDF
    Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity

    Community-driven development for computational biology at Sprints, Hackathons and Codefests

    Get PDF
    Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects

    EuPathDB: the eukaryotic pathogen genomics database resource

    Get PDF
    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of hostā€“pathogen interactions

    Multi-locus analysis resolves the epidemic finch strain of Trichomonas gallinae and suggests introgression from divergent trichomonads

    Get PDF
    In Europe, Trichomonas gallinae recently emerged as a cause of epidemic disease in songbirds. A highly virulent and clonal strain of the parasite, first found in the UK, has become the predominant strain there and spread to continental Europe. Discriminating this epidemic strain of T. gallinae from other strains necessitated development of multi-locus sequence typing (MLST). Development of the MLST was facilitated by the assembly and annotation of a 54.7 Mb draft genome of a cloned stabilate of the A1 European finch epidemic strain (isolated from Greenfinch, Carduelis chloris, XT-1081/07 in 2007) containing 21,924 protein coding genes. This enabled construction of a robust 19 locus MLST based on existing typing loci for Trichomonas vaginalis and T. gallinae. Our MLST has the sensitivity to discriminate strains within existing genotypes confidently, and resolves the American finch A1 genotype from the epidemic European finch A1 genotype. Interestingly, one isolate we obtained from a captive black-naped fruit dove Ptilinopsus melanospilus, was not truly T. Ā¬Ā¬Ā¬gallinae but a hybrid of T. gallinae with a distant trichomonad lineage. Phylogenetic analysis of the individual loci in this fruit dove provides evidence of gene flow between distant trichomonad lineages at two of the 19 loci examined and may provide precedence for the emergence of other hybrid trichomonad genomes including T. vaginalis

    <it>LTRsift</it>: a graphical user interface for semi-automatic classification and postprocessing of <it>de novo</it> detected LTR retrotransposons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide <it>de novo</it> detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a <it>de novo</it> classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow.</p> <p>Results</p> <p>We have developed <it>LTRsift</it>, an interactive graphical software tool for semi-automatic postprocessing of <it>de novo</it> predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how <it>LTRsift</it> can be employed in the context of a genome-wide LTR retrotransposon survey effort.</p> <p>Conclusions</p> <p><it>LTRsift</it> is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The <it>LTRsift</it> software is freely available at <url>http://www.zbh.uni-hamburg.de/LTRsift</url> under an open-source license.</p
    corecore