34 research outputs found

    Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes.</p> <p>Results</p> <p>In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate.</p> <p>Conclusion</p> <p>These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.</p

    MaHPIC malaria systems biology data from Plasmodium cynomolgi sporozoite longitudinal infections in macaques

    Get PDF
    Plasmodium cynomolgi causes zoonotic malarial infections in Southeast Asia and this parasite species is important as a model for Plasmodium vivax and Plasmodium ovale. Each of these species produces hypnozoites in the liver, which can cause relapsing infections in the blood. Here we present methods and data generated from iterative longitudinal systems biology infection experiments designed and performed by the Malaria Host-Pathogen Interaction Center (MaHPIC) to delve deeper into the biology, pathogenesis, and immune responses of P. cynomolgi in the Macaca mulatta host. Infections were initiated by sporozoite inoculation. Blood and bone marrow samples were collected at defined timepoints for biological and computational experiments and integrative analyses revolving around primary illness, relapse illness, and subsequent disease and immune response patterns. Parasitological, clinical, haematological, immune response, and -omic datasets (transcriptomics, proteomics, metabolomics, and lipidomics) including metadata and computational results have been deposited in public repositories. The scope and depth of these datasets are unprecedented in studies of malaria, and they are projected to be a F.A.I.R., reliable data resource for decades

    VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center in 2023.

    Get PDF
    The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes

    A Survey of Innovation through Duplication in the Reduced Genomes of Twelve Parasites

    No full text
    <div><p>We characterize the prevalence, distribution, divergence, and putative functions of detectable two-copy paralogs and segmental duplications in the Apicomplexa, a phylum of parasitic protists. Apicomplexans are mostly obligate intracellular parasites responsible for human and animal diseases (e.g. malaria and toxoplasmosis). Gene loss is a major force in the phylum. Genomes are small and protein-encoding gene repertoires are reduced. Despite this genomic streamlining, duplications and gene family amplifications are present. The potential for innovation introduced by duplications is of particular interest. We compared genomes of twelve apicomplexans across four lineages and used orthology and genome cartography to map distributions of duplications against genome architectures. Segmental duplications appear limited to five species. Where present, they correspond to regions enriched for multi-copy and species-specific genes, pointing toward roles in adaptation and innovation. We found a phylum-wide association of duplications with dynamic chromosome regions and syntenic breakpoints. Trends in the distribution of duplicated genes indicate that recent, species-specific duplicates are often tandem while most others have been dispersed by genome rearrangements. These trends show a relationship between genome architecture and gene duplication. Functional analysis reveals: proteases, which are vital to a parasitic lifecycle, to be prominent in putative recent duplications; a pair of paralogous genes in <i>Toxoplasma gondii</i> previously shown to produce the rate-limiting step in dopamine synthesis in mammalian cells, a possible link to the modification of host behavior; and phylum-wide differences in expression and subcellular localization, indicative of modes of divergence. We have uncovered trends in multiple modes of duplicate divergence including sequence, intron content, expression, subcellular localization, and functions of putative recent duplicates that highlight the role of duplications in the continuum of forces that have shaped these genomes.</p></div

    Duplicate gene distribution with respect to genome architecture.

    No full text
    <p>Colored circles represent the chromosomes and contigs in each genome (one color gradient/genome). Each species’ genome is labeled with the genus species abbreviation and chromosome/contig number Small unassembled contigs appear as black lines at the end of the respective genome sequence. Species are grouped based on genome size and karyotype. Tick marks = 1 Mb in <b>A</b> and 100 kb in <b>B</b>. Arcs connect two-copy paralog loci. All arcs have two ends; tandem copies may appear as a single line. Paralog start and stop coordinates on chromosomes/contigs are expanded for visualization. Arc colors identify ortholog copy number in the closest relative(s). Black = two-copy, Green = species-specific, Red = one copy, Blue>two-copy. Pf = <i>Plasmodium falciparum,</i> Pv = <i>P. vivax</i>, Tg = <i>Toxoplasma gondii,</i> Tp = <i>Theileria parva</i>, Cp = <i>Cryptosporidium parvum</i>. Red arrows in <b>B</b> indicate the fatty acid synthase and polyketide synthase genes.</p

    Scale, scope, and outcome of detected apicomplexan innovative duplication.

    No full text
    <p>Cladogram branch colors indicate major lineages. Strains follow species names. Genome sizes and protein-encoding gene counts are below each species name. Circles located on each branch contain counts, results, and trends for detected measures of innovation and are only present if data were available. For ‘Two-Copy Gene Duplication’, numbers are for total duplicate pairs or pairs with detected differences. ‘Species-Centric Duplications’ includes two categories: ‘species-specific pairs’ and ‘pairs with single copy ortholog’. ‘Differential Expression’ circles indicate detected differences over pairs with available data. A 60% cutoff was used to identify the major trend for ‘Distribution’ and ‘Intron Number’. Numbers in white circles are of unique genes in segmental duplications. White circles are scaled versions of distributions. Red ‘X’s = ‘none detected’.</p
    corecore