15,594 research outputs found

    The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases

    Get PDF
    Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Assis, Juliana. Fundación Oswaldo Cruz; BrasilFil: Gomes Araújo, Flávio M.. Fundación Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. Fundación Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Oliveira, Guilherme. Instituto Tecnológico Vale; Brasil. Fundación Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentin

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    101 Dothideomycetes genomes: A test case for predicting lifestyles and emergence of pathogens.

    Get PDF
    Dothideomycetes is the largest class of kingdom Fungi and comprises an incredible diversity of lifestyles, many of which have evolved multiple times. Plant pathogens represent a major ecological niche of the class Dothideomycetes and they are known to infect most major food crops and feedstocks for biomass and biofuel production. Studying the ecology and evolution of Dothideomycetes has significant implications for our fundamental understanding of fungal evolution, their adaptation to stress and host specificity, and practical implications with regard to the effects of climate change and on the food, feed, and livestock elements of the agro-economy. In this study, we present the first large-scale, whole-genome comparison of 101 Dothideomycetes introducing 55 newly sequenced species. The availability of whole-genome data produced a high-confidence phylogeny leading to reclassification of 25 organisms, provided a clearer picture of the relationships among the various families, and indicated that pathogenicity evolved multiple times within this class. We also identified gene family expansions and contractions across the Dothideomycetes phylogeny linked to ecological niches providing insights into genome evolution and adaptation across this group. Using machine-learning methods we classified fungi into lifestyle classes with >95 % accuracy and identified a small number of gene families that positively correlated with these distinctions. This can become a valuable tool for genome-based prediction of species lifestyle, especially for rarely seen and poorly studied species

    C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio.</p> <p>Results</p> <p>We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined.</p> <p>Conclusion</p> <p>We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.</p

    Characterisation of the African Swine Fever Virus Transcription System

    Get PDF
    African Swine Fever Virus (ASFV) causes lethal haemorrhagic fever in domestic pigs, presenting the largest global threat to animal farming on record. Despite its impact, the mechanisms and regulation of ASFV gene expression were poorly understood. In order to fill this gap in our knowledge, I have investigated diverse aspects of ASFV transcription and report in my thesis (i) transcriptome analyses, (ii) the expression, purification and biochemical analyses of recombinant transcription factors, and (iii) computational characterisation of ASFV-RNA polymerase (RNAP) subunits and transcription initiation factors. We have generated the first genome-wide transcriptomic landscape of ASFV during infection, using a complement of RNA-based Next Generation Sequencing techniques. We have mapped the ASFV gene transcription start sites, termination sites, and quantified transcript abundance during the early and late stages of infection. We have demonstrated viral gene expression patterns, which are facilitated by newly identified promoter motifs, and shared across lab-attenuated and pathogenic strains (BA71V and Georgia 2007/1, respectively). We have also demonstrated ASFV uses a polyT (polyU in the RNA) terminator motif genome-wide, and delved into how late infection alters transcription termination patterns. We identified a conserved early promoter motif in ASFV, similar to that used by the heterodimeric VACV early transcription factor (VETF), for which ASFV also encodes homologs. We therefore, co-expressed and purified the ASFV VETF subunits (D6 and A7) recombinantly, using a baculovirus-insect cell expression system. We demonstrated this large (284 kDa) ASFV-D6-A7 complex specifically binds to early but not late promoter templates, though with some differences to its VACV counterpart, likely due to the ASFV proteins encoding additional domains. We have therefore demonstrated that baculovirus-insect cell expression is viable for co-expressing large ASFV complexes, and developed the ground work to apply this system to express 8-subunit ASFV-RNAP

    Genome Biol.

    No full text
    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Computational prediction of transcription-factor binding site locations

    Get PDF
    Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful

    A dedicated haem lyase is required for the maturation of a novel bacterial cytochrome c with unconventional covalent haem binding

    Get PDF
    In bacterial c-type cytochromes, the haem cofactor is covalently attached via two cysteine residues organized in a haem c-binding motif. Here, a novel octa-haem c protein, MccA, is described that contains only seven conventional haem c-binding motifs (CXXCH), in addition to several single cysteine residues and a conserved CH signature. Mass spectrometric analysis of purified MccA from Wolinella succinogenes suggests that two of the single cysteine residues are actually part of an unprecedented CX15CH sequence involved in haem c binding. Spectroscopic characterization of MccA identified an unusual high-potential haem c with a red-shifted absorption maximum, not unlike that of certain eukaryotic cytochromes c that exceptionally bind haem via only one thioether bridge. A haem lyase gene was found to be specifically required for the maturation of MccA in W. succinogenes. Equivalent haem lyase-encoding genes belonging to either the bacterial cytochrome c biogenesis system I or II are present in the vicinity of every known mccA gene suggesting a dedicated cytochrome c maturation pathway. The results necessitate reconsideration of computer-based prediction of putative haem c-binding motifs in bacterial proteomes

    An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome

    Get PDF
    In eukaryotic genomes, it is challenging to accurately determine target sites of transcription factors (TFs) by only using sequence information. Previous efforts were made to tackle this task by considering the fact that TF binding sites tend to be more conserved than other functional sites and the binding sites of several TFs are often clustered. Recently, ChIP-chip and ChIP-sequencing experiments have been accumulated to identify TF binding sites as well as survey the chromatin modification patterns at the regulatory elements such as promoters and enhancers. We propose here a hidden Markov model (HMM) to incorporate sequence motif information, TF-DNA interaction data and chromatin modification patterns to precisely identify cis-regulatory modules (CRMs). We conducted ChIP-chip experiments on four TFs, CREB, E2F1, MAX, and YY1 in 1% of the human genome. We then trained a hidden Markov model (HMM) to identify the labels of the CRMs by incorporating the sequence motifs recognized by these TFs and the ChIP-chip ratio. Chromatin modification data was used to predict the functional sites and to further remove false positives. Cross-validation showed that our integrated HMM had a performance superior to other existing methods on predicting CRMs. Incorporating histone signature information successfully penalized false prediction and improved the whole performance. The dataset we used and the software are available at http://nash.ucsd.edu/CIS/
    • …
    corecore