2 research outputs found

    Targeted metatranscriptomics of compost derived consortia reveals a GH11 exerting an unusual exo-1,4-β-xylanase activity

    Get PDF
    Background: Using globally abundant crop residues as a carbon source for energy generation and renewable chemicals production stands out as a promising solution to reduce current dependency on fossil fuels. In nature, such as in compost habitats, microbial communities efficiently degrade the available plant biomass using a diverse set of synergistic enzymes. However, deconstruction of lignocellulose remains a challenge for industry due to recalcitrant nature of the substrate and the inefficiency of the enzyme systems available, making the economic production of lignocellulosic biofuels difficult. Metatranscriptomic studies of microbial communities can unveil the metabolic functions employed by lignocellulolytic consortia and identify new biocatalysts that could improve industrial lignocellulose conversion. Results: In this study, a microbial community from compost was grown in minimal medium with sugarcane bagasse sugarcane bagasse as the sole carbon source. Solid-state nuclear magnetic resonance was used to monitor lignocellulose degradation; analysis of metatranscriptomic data led to the selection and functional characterization of several target genes, revealing the first glycoside hydrolase from Carbohydrate Active Enzyme family 11 with exo-1,4-β-xylanase activity. The xylanase crystal structure was resolved at 1.76 Å revealing the structural basis of exo-xylanase activity. Supplementation of a commercial cellulolytic enzyme cocktail with the xylanase showed improvement in Avicel hydrolysis in the presence of inhibitory xylooligomers. Conclusions: This study demonstrated that composting microbiomes continue to be an excellent source of biotechnologically important enzymes by unveiling the diversity of enzymes involved in in situ lignocellulose degradation

    Improving algorithms of gene prediction in prokaryotic genomes, metagenomes, and eukaryotic transcriptomes

    Get PDF
    Next-generation sequencing has generated enormous amount of DNA and RNA sequences that potentially carry volumes of genetic information, e.g. protein-coding genes. The thesis is divided into three main parts describing i) GeneMarkS-2, ii) GeneMarkS-T, and iii) MetaGeneTack. In prokaryotic genomes, ab initio gene finders can predict genes with high accuracy. However, the error rate is not negligible and largely species-specific. Most errors in gene prediction are made in genes located in genomic regions with atypical GC composition, e.g. genes in pathogenicity islands. We describe a new algorithm GeneMarkS-2 that uses local GC-specific heuristic models for scoring individual ORFs in the first step of analysis. Predicted atypical genes are retained and serve as ‘external’ evidence in subsequent runs of self-training. GeneMarkS-2 also controls the quality of training process by effectively selecting optimal orders of the Markov chain models as well as duration parameters in the hidden semi-Markov model. GeneMarkS-2 has shown significantly improved accuracy compared with other state-of-the-art gene prediction tools. Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) provides large amount of RNA reads that can be assembled to full transcriptome. We have developed a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. Unsupervised estimation of parameters of the algorithm makes unnecessary several steps in the conventional gene prediction protocols, most importantly the manually curated preparation of training sets. We have demonstrated that the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting gene starts compares favorably to other existing methods. Frameshift prediction (FS) is important for analysis and biological interpretation of metagenomic sequences. Reads in metagenomic samples are prone to sequencing errors. Insertion and deletion errors that change the coding frame impair the accurate identification of protein coding genes. Accurate frameshift prediction requires sufficient amount of data to estimate parameters of species-specific statistical models of protein-coding and non-coding regions. However, this data is not available; all we have is metagenomic sequences of unknown origin. The challenge of ab initio FS detection is, therefore, twofold: (i) to find a way to infer necessary model parameters and (ii) to identify positions of frameshifts (if any). We describe a new tool, MetaGeneTack, which uses a heuristic method to estimate parameters of sequence models used in the FS detection algorithm. It was shown on several test sets that the performance of MetaGeneTack FS detection is comparable or better than the one of earlier developed program FragGeneScan.Ph.D
    corecore