46 research outputs found

    CodingQuarry: Highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts

    Get PDF
    Background: The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. Results: CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. Conclusions: We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available (https://sourceforge.net/projects/codingquarry/), and suitable for incorporation into genome annotation pipelines

    Computational Studies and Biosynthesis of Natural Products with Promising Anticancer Properties

    Get PDF
    We present an overview of computational approaches for the prediction of metabolic pathways by which plants biosynthesise compounds, with a focus on selected very promising anticancer secondary metabolites from floral sources. We also provide an overview of databases for the retrieval of useful genomic data, discussing the strengths and limitations of selected prediction software and the main computational tools (and methods), which could be employed for the investigation of the uncharted routes towards the biosynthesis of some of the identified anticancer metabolites from plant sources, eventually using specific examples to address some knowledge gaps when using these approaches

    GENOMICS BASED APPROACHES TO FUNGAL EVOLUTION

    Get PDF
    Advances in DNA sequencing and data analysis make it possible to address questions in population genetics and evolution at the genomic level. Fungi are excellent subjects for such studies, because they are found in diverse environments, have short generation times, can be maintained in culture and have relatively small genomes. My research employed genetic approaches using a variety of sequencing technologies and methods of analysis to explore questions in fungal evolution. In one study, I explored the genetics behind differences in thermotolerance between isolates of Neurospora discreta from Alaska and New Mexico. Isolates from the two states exhibited differences in maximal growth temperature, with New Mexico isolates being substantially more thermotolerant than isolates from Alaska. Genomic scale comparisons of progeny from crosses between isolates from New Mexico and Alaska indicated that two regions, one on chromosome III and another on chromosome I, are responsible for differences in thermotolerance. Examination of these regions revealed numerous differences between the New Mexico and Alaska isolates at nucleotide and amino-acid levels; and it identified candidate genes for being important for differences in maximal growth temperatures. In a second study, I explored the genomic differences between pathogenic and endophytic isolates in the genus Monosporascus. Culture and sequence-based surveys of root associating fungi at the Sevilleta National Wildlife Refuge (SNWR) revealed the ubiquitous presence of members of this genus. Although M. cannonballus is known as a severe pathogen of melon roots in agricultural settings, all of the host plants associating with Monosporascus species in natural settings appeared to be disease free. Complete genome sequences were obtained from three M. cannonballus isolates, an M. ibericus isolate and six SNWR isolates. Comparative genome analyses revealed that 1) isolates of Monosporascus possess genomes that are more than twice the size of those typical for members of the Sordariomycetes, while having typical numbers of protein-coding genes; 2) isolates from diverse grasses, tree and forbs include lineages closely-related to previously described species including M. cannonballus, in addition to novel lineages; and 3) species of Monosporascus and other Xylariales lack mating-type gene regions typical of other members of the Pezizomycotina

    Whole genome sequence and diversity in multigene families of Babesia ovis

    Get PDF
    Ovine babesiosis, caused by Babesia ovis, is an acute, lethal, and endemic disease worldwide and causes a huge economic loss to animal industry. Pathogen genome sequences can be utilized for selecting diagnostic markers, drug targets, and antigens for vaccine development; however, those for B. ovis have not been available so far. In this study, we obtained a draft genome sequence for B. ovis isolated from an infected sheep in Turkey. The genome size was 7.81 Mbp with 3,419 protein-coding genes. It consisted of 41 contigs, and the N50 was 526 Kbp. There were 259 orthologs identified among eight Babesia spp., Plasmodium falciparum, and Toxoplasma gondii. A phylogeny was estimated on the basis of the orthologs, which showed B. ovis to be closest to B. bovis. There were 43 ves genes identified using hmm model as well. They formed a discriminating cluster to other ves multigene family of Babesia spp. but showed certain similarities to those of B. bovis, B. caballi, and Babesia sp. Xinjiang, which is consistent with the phylogeny. Comparative genomics among B. ovis and B. bovis elucidated uniquely evolved genes in these species, which may account for the adaptation

    Intra-species genomic variation in the pine pathogen Fusarium circinatum

    Get PDF
    Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates, FSP34 and KS17. Multi-sample variant calling identified a total of 461,683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. The variant density was higher on the sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets that were affected by high impact variants. A core genome representing genomic elements that were conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements were present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help to inform novel disease management strategies against the pathogen.DATA AVAILABILTY STATEMENT : The Whole Genome Shotgun project for Fusarium circinatum CMWF1803 has been deposited at DDBJ/ENA/GenBank under the accession JAEHFH000000000. The version described in this paper is version JAEHFH010000000. The Whole Genome Shotgun project for Fusarium circinatum CMWF560 has been deposited at DDBJ/ENA/GenBank under the accession JAEHFI000000000. The version described in this paper is version JAEHFI010000000. The Whole Genome Shotgun project for Fusarium circinatum CMWF567 has been deposited at DDBJ/ENA/GenBank under the accession JADZLS000000000. The version described in this paper is version JADZLS010000000. The Whole Genome Shotgun project for Fusarium circinatum UG27 has been deposited at DDBJ/ENA/ GenBank under the accession JAELVK000000000. The version described in this paper is version JAELVK010000000. The Whole Genome Shotgun project for Fusarium circinatum UG10 has been deposited at DDBJ/ENA/GenBank under the accession JAGJRQ000000000. The version described in this paper is version JAGJRQ010000000.The South African Department of Science and Innovation’s South African Research Chair Initiative and the DSI-NRF Centre of Excellence in Plant Health Biotechnology at the Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria.http://www.mdpi.com/journal/jofBiochemistryForestry and Agricultural Biotechnology Institute (FABI)GeneticsMicrobiology and Plant Patholog

    Draft genome sequence of Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina, and Morchella septimelata

    Get PDF
    Draft genomes of the species Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina and Morchella septimelata are presented. Both mating types (MAT1-1 and MAT1-2) of Cercospora beticola are included. Two strains of Coleophoma cylindrospora that produce sulfated homotyrosine echinocandin variants, FR209602, FR220897 and FR220899 are presented. The sequencing of Aspergillus mulundensis, Coleophoma cylindrospora and Phialophora cf. hyalina has enabled mapping of the gene clusters encoding the chemical diversity from the echinocandin pathways, providing data that reveals the complexity of secondary metabolism in these different species. Overall these genomes provide a valuable resource for understanding the molecular processes underlying pathogenicity (in some cases), biology and toxin production of these economically important fungi

    Plant Growth Promotion, Phytohormone Production and Genomics of the Rhizosphere-Associated Microalga, Micractinium rhizosphaerae sp. nov.

    Get PDF
    Funding Information: This work was funded by Fundação para a Ciência e Tecnologia/Ministério da Ciência, Tecnologia e Ensino Superior (FCT/MCTES, Portugal) through national funds to iNOVA4Health (UIDB/04462/2020 and UIDP/04462/2020) and the Associate Laboratory LS4FUTURE (LA/P/0087/2020). Funding Information: F.Q.-N. and P.R.B. acknowledge receiving a PhD fellowship from FCT (2022.10633.BD; 2021.07927.BD514, respectively). Publisher Copyright: © 2023 by the authors.Microalgae are important members of the soil and plant microbiomes, playing key roles in the maintenance of soil and plant health as well as in the promotion of plant growth. However, not much is understood regarding the potential of different microalgae strains in augmenting plant growth, or the mechanisms involved in such activities. In this work, the functional and genomic characterization of strain NFX-FRZ, a eukaryotic microalga belonging to the Micractinium genus that was isolated from the rhizosphere of a plant growing in a natural environment in Portugal, is presented and analyzed. The results obtained demonstrate that strain NFX-FRZ (i) belongs to a novel species, termed Micractinium rhizosphaerae sp. nov.; (ii) can effectively bind to tomato plant tissues and promote its growth; (iii) can synthesize a wide range of plant growth-promoting compounds, including phytohormones such as indole-3-acetic acid, salicylic acid, jasmonic acid and abscisic acid; and (iv) contains multiple genes involved in phytohormone biosynthesis and signaling. This study provides new insights regarding the relevance of eukaryotic microalgae as plant growth-promoting agents and helps to build a foundation for future studies regarding the origin and evolution of phytohormone biosynthesis and signaling, as well as other plant colonization and plant growth-promoting mechanisms in soil/plant-associated Micractinium.publishersversionpublishe

    The gene-rich genome of the scallop Pecten maximus.

    Get PDF
    BACKGROUND: The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management. FINDINGS: Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species. CONCLUSIONS: The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid

    Molecular Basis of Pathogenesis and Host Determination in Cercospora sojina: from Phenotypic to Genotypic Patterns

    Get PDF
    Frogeye leaf spot (FLS), caused by Cercospora sojina, is an important and recurrent disease of soybean in many production regions. Genetic resistance is potentially one of the most cost-effective and sustainable strategies to control FLS. However, C. sojina has already demonstrated the ability to overcome resistance conveyed by single R-genes (resistance genes) of soybeans, followed by the emergence of new physiological races. Although understanding population genomics and the virulence gene inventories in fungal plant pathogens is extremely important to improve disease control measures, studies regarding host specificity and pathogenesis in C. sojina are very limited. Therefore, the overarching goal of this study was to elucidate the genetic and molecular basis of race specificity, and pathogenesis in general, in C. sojina. To this end, a bulk-sequencing analysis was performed on two subcollections of C. sojina classified by differential infection responses (virulence or avirulence) on cultivars Blackhawk and Hood followed by mapping to the recently assembled C. sojina strain 2.2.3 reference genome. From the 18004 SNPs identified among the two subcollections, 75 SNPs showed an Fst\u3e 0.2 and were localized within three distinct loci of the C. sojina genome, which harbored genes implicated in oxidative stress and pathogenesis. Unusual genomic architectures were also observed in these regions, possibly resulting from InDels or duplications in the C. sojina genome. Further SNP annotation analysis also identified candidate effector genes under positive selection pressure (dN/dS \u3e 1.0), including two genes potentially restricted to the Cercospora genus. Intriguingly, C. cf. flagellaris isolates causing FLS-like lesions and C. sojina isolates virulent on cultivar Davis were also identified within the collection of fungal isolates, which underscores the importance of better understanding host specificity in the C. sojina and Cercospora spp. general. Altogether, this study provided key resources to unravel the genetics and genomics of race specificity and pathogenesis in C. sojina, and augmented long-term efforts to improve FLS resistance in soybeans through breeding and genetic engineering approaches
    corecore