35 research outputs found

    Linking enzyme sequence to function using conserved property difference locator to identify and annotate positions likely to control specific functionality

    Get PDF
    BACKGROUND: Families of homologous enzymes evolved from common progenitors. The availability of multiple sequences representing each activity presents an opportunity for extracting information specifying the functionality of individual homologs. We present a straightforward method for the identification of residues likely to determine class specific functionality in which multiple sequence alignments are converted to an annotated graphical form by the Conserved Property Difference Locator (CPDL) program. RESULTS: Three test cases, each comprised of two groups of funtionally-distinct homologs, are presented. Of the test cases, one is a membrane and two are soluble enzyme families. The desaturase/hydroxylase data was used to design and test the CPDL algorithm because a comparative sequence approach had been successfully applied to manipulate the specificity of these enzymes. The other two cases, ATP/GTP cyclases, and MurD/MurE synthases were chosen because they are well characterized structurally and biochemically. For the desaturase/hydroxylase enzymes, the ATP/GTP cyclases and the MurD/MurE synthases, groups of 8 (of ~400), 4 (of ~150) and 10 (of >400) residues, respectively, of interest were identified that contain empirically defined specificity determining positions. CONCLUSION: CPDL consistently identifies positions near enzyme active sites that include those predicted from structural and/or biochemical studies to be important for specificity and/or function. This suggests that CPDL will have broad utility for the identification of potential class determining residues based on multiple sequence analysis of groups of homologous proteins. Because the method is sequence, rather than structure, based it is equally well suited for designing structure-function experiments to investigate membrane and soluble proteins

    Bioprospecting metagenomes: glycosyl hydrolases for converting biomass

    Get PDF
    Throughout immeasurable time, microorganisms evolved and accumulated remarkable physiological and functional heterogeneity, and now constitute the major reserve for genetic diversity on earth. Using metagenomics, namely genetic material recovered directly from environmental samples, this biogenetic diversification can be accessed without the need to cultivate cells. Accordingly, microbial communities and their metagenomes, isolated from biotopes with high turnover rates of recalcitrant biomass, such as lignocellulosic plant cell walls, have become a major resource for bioprospecting; furthermore, this material is a major asset in the search for new biocatalytics (enzymes) for various industrial processes, including the production of biofuels from plant feedstocks. However, despite the contributions from metagenomics technologies consequent upon the discovery of novel enzymes, this relatively new enterprise requires major improvements. In this review, we compare function-based metagenome screening and sequence-based metagenome data mining, discussing the advantages and limitations of both methods. We also describe the unusual enzymes discovered via metagenomics approaches, and discuss the future prospects for metagenome technologies

    A clustering property of highly-degenerate transcription factor binding sites in the mammalian genome

    Get PDF
    Transcription factor binding sites (TFBSs) are short DNA sequences interacting with transcription factors (TFs), which regulate gene expression. Due to the relatively short length of such binding sites, it is largely unclear how the specificity of protein–DNA interaction is achieved. Here, we have performed a genome-wide analysis of TFBS-like sequences for the transcriptional repressor, RE1 Silencing Transcription Factor (REST), as well as for several other representative mammalian TFs (c-myc, p53, HNF-1 and CREB). We find a nonrandom distribution of inexact sites for these TFs, referred to as highly-degenerate TFBSs, that are enriched around the cognate binding sites. Comparisons among human, mouse and rat orthologous promoters reveal that these highly-degenerate sites are conserved significantly more than expected by random chance, suggesting their positive selection during evolution. We propose that this arrangement provides a favorable genomic landscape for functional target site selection

    Widespread polycistronic gene expression in green algae

    Get PDF
    Polycistronic gene expression, common in prokaryotes, was thought to be extremely rare in eukaryotes. The development of long-read sequencing of full-length transcript isomers (Iso-Seq) has facilitated a reexamination of that dogma. Using Iso-Seq, we discovered hundreds of examples of polycistronic expression of nuclear genes in two divergent species of green algae: Chlamydomonas reinhardtii and Chromochloris zofingiensis Here, we employ a range of independent approaches to validate that multiple proteins are translated from a common transcript for hundreds of loci. A chromatin immunoprecipitation analysis using trimethylation of lysine 4 on histone H3 marks confirmed that transcription begins exclusively at the upstream gene. Quantification of polyadenylated [poly(A)] tails and poly(A) signal sequences confirmed that transcription ends exclusively after the downstream gene. Coexpression analysis found nearly perfect correlation for open reading frames (ORFs) within polycistronic loci, consistent with expression in a shared transcript. For many polycistronic loci, terminal peptides from both ORFs were identified from proteomics datasets, consistent with independent translation. Synthetic polycistronic gene pairs were transcribed and translated in vitro to recapitulate the production of two distinct proteins from a common transcript. The relative abundance of these two proteins can be modified by altering the Kozak-like sequence of the upstream gene. Replacement of the ORFs with selectable markers or reporters allows production of such heterologous proteins, speaking to utility in synthetic biology approaches. Conservation of a significant number of polycistronic gene pairs between C. reinhardtii, C. zofingiensis, and five other species suggests that this mechanism may be evolutionarily ancient and biologically important in the green algal lineage

    Differential binding of Escherichia coli McrA protein to DNA sequences that contain the dinucleotide m5CpG

    Get PDF
    The Escherichia coli McrA protein, a putative C5-methylcytosine/C5-hydroxyl methylcytosine-specific nuclease, binds DNA with symmetrically methylated HpaII sequences (Cm5CGG), but its precise recognition sequence remains undefined. To determine McrA’s binding specificity, we cloned and expressed recombinant McrA with a C-terminal StrepII tag (rMcrA-S) to facilitate protein purification and affinity capture of human DNA fragments with m5C residues. Sequence analysis of a subset of these fragments and electrophoretic mobility shift assays with model methylated and unmethylated oligonucleotides suggest that N(Y > R) m5CGR is the canonical binding site for rMcrA-S. In addition to binding HpaII-methylated double-stranded DNA, rMcrA-S binds DNA containing a single, hemimethylated HpaII site; however, it does not bind if A, C, T or U is placed across from the m5C residue, but does if I is opposite the m5C. These results provide the first systematic analysis of McrA’s in vitro binding specificity

    The Metagenome of an Anaerobic Microbial Community Decomposing Poplar Wood Chips

    Get PDF
    This study describes the composition and metabolic potential of a lignocellulosic biomass degrading community that decays poplar wood chips under anaerobic conditions. We examined the community that developed on poplar biomass in a non-aerated bioreactor over the course of a year, with no microbial inoculation other than the naturally occurring organisms on the woody material. The composition of this community contrasts in important ways with biomass-degrading communities associated with higher organisms, which have evolved over millions of years into a symbiotic relationship. Both mammalian and insect hosts provide partial size reduction, chemical treatments (low or high pH environments), and complex enzymatic ‘secretomes’ that improve microbial access to cell wall polymers. We hypothesized that in order to efficiently degrade coarse untreated biomass, a spontaneously assembled free-living community must both employ alternative strategies, such as enzymatic lignin depolymerization, for accessing hemicellulose and cellulose and have a much broader metabolic potential than host-associated communities. This would suggest that such a community would make a valuable resource for finding new catalytic functions involved in biomass decomposition and gaining new insight into the poorly understood process of anaerobic lignin depolymerization. Therefore, in addition to determining the major players in this community, our work specifically aimed at identifying functions potentially involved in the depolymerization of cellulose, hemicelluloses, and lignin, and to assign specific roles to the prevalent community members in the collaborative process of biomass decomposition. A bacterium similar to Magnetospirillum was identified among the dominant community members, which could play a key role in the anaerobic breakdown of aromatic compounds. We suggest that these compounds are released from the lignin fraction in poplar hardwood during the decay process, which would point to lignin-modification or depolymerization under anaerobic conditions

    Cell context dependent p53 genome-wide binding patterns and enrichment at repeats.

    No full text
    The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). Our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways

    p53 Distribution of p53 ChIP-seq peaks on chromosome 6 in the cell lines HCT116 (this study) and IMR90 [16].

    No full text
    <p>Plotted is peak frequency (per Mb) normalized by number of peaks in chromosome, smoothed by a Gaussian kernel density, peak-height weighted.</p

    Sequencing logos depicting the statistically significantly enriched motifs identified in the high-confidence p53 ChIP-seq peaks in HCT116, in IMR90 [16], and in a set of 168 p53 reference sites (p53 REF SET), see Materials and Methods.

    No full text
    <p>Analysis was done with the MEME suite <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113492#pone.0113492-Bailey1" target="_blank">[42]</a>, using the programs MEME-ChIP <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113492#pone.0113492-Machanick1" target="_blank">[48]</a> and DREME <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113492#pone.0113492-Bailey3" target="_blank">[46]</a>. Shown is the distribution of the identified motifs in 100 nt windows centered at the peak maxima, reported by CentriMo <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113492#pone.0113492-Bailey2" target="_blank">[43]</a>.</p
    corecore