39 research outputs found

    Characterizing protein-ligand binding using atomistic simulation and machine learning: Application to drug resistance in HIV-1 protease

    Get PDF
    Over the past several decades, atomistic simulations of biomolecules, whether carried out using molecular dynamics or Monte Carlo techniques, have provided detailed insights into their function. Comparing the results of such simulations for a few closely related systems has guided our understanding of the mechanisms by which changes like ligand binding or mutation can alter function. The general problem of detecting and interpreting such mechanisms from simulations of many related systems, however, remains a challenge. This problem is addressed here by applying supervised and unsupervised machine learning techniques to a variety of thermodynamic observables extracted from molecular dynamics simulations of different systems. As an important test case, these methods are applied to understanding the evasion by HIV-1 protease of darunavir, a potent inhibitor to which resistance can develop via the simultaneous mutation of multiple amino acids. Complex mutational patterns have been observed among resistant strains, presenting a challenge to developing a mechanistic picture of resistance in the protease. In order to dissect these patterns and gain mechanistic insight on the role of specific mutations, molecular dynamics simulations were carried out on a collection of HIV-1 protease variants, chosen to include highly resistant strains and susceptible controls, in complex with darunavir. Using a machine learning approach that takes advantage of the hierarchical nature in the relationships among sequence, structure and function, an integrative analysis of these trajectories reveals key details of the resistance mechanism, including changes in protein structure, hydrogen bonding and protein-ligand contacts

    Identification of functional modules that correlate with phenotypic difference: the influence of network topology

    Get PDF
    A gene set enrichment analysis method for including network topology in the identification of genes involved in phenotypic alterations is described. Classifications: Genome studies, Method

    A User\u27s Guide to the Encyclopedia of DNA Elements (ENCODE)

    Get PDF
    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

    Functional analysis of transcription factor binding sites in human promoters

    Get PDF
    BACKGROUND: The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1. RESULTS: In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif. CONCLUSIONS: The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions

    Genome-wide co-occupancy of AML1-ETO and N-CoR defines the t(8;21) AML signature in leukemic cells

    Get PDF
    BACKGROUND: Many leukemias result from chromosomal rearrangements. The t(8;21) chromosomal translocation produces AML1-ETO, an oncogenic fusion protein that compromises the function of AML1, a transcription factor critical for myeloid cell differentiation. Because of the pressing need for new therapies in the treatment of acute myleoid leukemia, we investigated the genome-wide occupancy of AML1-ETO in leukemic cells to discover novel regulatory mechanisms involving AML-ETO bound genes. RESULTS: We report the co-localization of AML1-ETO with the N-CoR co-repressor to be primarily on genomic regions distal to transcriptional start sites (TSSs). These regions exhibit over-representation of the motif for PU.1, a key hematopoietic regulator and member of the ETS family of transcription factors. A significant discovery of our study is that genes co-occupied by AML1-ETO and N-CoR (e.g., TYROBP and LAPTM5) are associated with the leukemic phenotype, as determined by analyses of gene ontology and by the observation that these genes are predominantly up-regulated upon AML1-ETO depletion. In contrast, the AML1-ETO/p300 gene network is less responsive to AML1-ETO depletion and less associated with the differentiation block characteristic of leukemic cells. Furthermore, a substantial fraction of AML1-ETO/p300 co-localization occurs near TSSs in promoter regions associated with transcriptionally active loci. CONCLUSIONS: Our findings establish a novel and dominant t(8;21) AML leukemia signature characterized by occupancy of AML1-ETO/N-CoR at promoter-distal genomic regions enriched in motifs for myeloid differentiation factors, thus providing mechanistic insight into the leukemic phenotype

    Genomic occupancy of Runx2 with global expression profiling identifies a novel dimension to control of osteoblastogenesis

    Get PDF
    BACKGROUND: Osteogenesis is a highly regulated developmental process and continues during the turnover and repair of mature bone. Runx2, the master regulator of osteoblastogenesis, directs a transcriptional program essential for bone formation through genetic and epigenetic mechanisms. While individual Runx2 gene targets have been identified, further insights into the broad spectrum of Runx2 functions required for osteogenesis are needed. RESULTS: By performing genome-wide characterization of Runx2 binding at the three major stages of osteoblast differentiation--proliferation, matrix deposition and mineralization--we identify Runx2-dependent regulatory networks driving bone formation. Using chromatin immunoprecipitation followed by high-throughput sequencing over the course of these stages, we identify approximately 80,000 significantly enriched regions of Runx2 binding throughout the mouse genome. These binding events exhibit distinct patterns during osteogenesis, and are associated with proximal promoters and also non-promoter regions: upstream, introns, exons, transcription termination site regions, and intergenic regions. These peaks were partitioned into clusters that are associated with genes in complex biological processes that support bone formation. Using Affymetrix expression profiling of differentiating osteoblasts depleted of Runx2, we identify novel Runx2 targets including Ezh2, a critical epigenetic regulator; Crabp2, a retinoic acid signaling component; Adamts4 and Tnfrsf19, two remodelers of the extracellular matrix. We demonstrate by luciferase assays that these novel biological targets are regulated by Runx2 occupancy at non-promoter regions. CONCLUSIONS: Our data establish that Runx2 interactions with chromatin across the genome reveal novel genes, pathways and transcriptional mechanisms that contribute to the regulation of osteoblastogenesis

    The bone-specific Runx2-P1 promoter displays conserved three-dimensional chromatin structure with the syntenic Supt3h promoter

    Get PDF
    Three-dimensional organization of chromatin is fundamental for transcriptional regulation. Tissue-specific transcriptional programs are orchestrated by transcription factors and epigenetic regulators. The RUNX2 transcription factor is required for differentiation of precursor cells into mature osteoblasts. Although organization and control of the bone-specific Runx2-P1 promoter have been studied extensively, long-range regulation has not been explored. In this study, we investigated higher-order organization of the Runx2-P1 promoter during osteoblast differentiation. Mining the ENCODE database revealed interactions between Runx2-P1 and Supt3h promoters in several non-mesenchymal human cell lines. Supt3h is a ubiquitously expressed gene located within the first intron of Runx2. These two genes show shared synteny across species from humans to sponges. Chromosome conformation capture analysis in the murine pre-osteoblastic MC3T3-E1 cell line revealed increased contact frequency between Runx2-P1 and Supt3h promoters during differentiation. This increase was accompanied by enhanced DNaseI hypersensitivity along with RUNX2 and CTCF binding at the Supt3h promoter. Furthermore, interplasmid-3C and luciferase reporter assays showed that the Supt3h promoter can modulate Runx2-P1 activity via direct association. Taken together, our data demonstrate physical proximity between Runx2-P1 and Supt3h promoters, consistent with their syntenic nature. Importantly, we identify the Supt3h promoter as a potential regulator of the bone-specific Runx2-P1 promoter. Acids Research

    An integrated encyclopedia of DNA elements in the human genome

    Get PDF
    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research

    Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors

    Get PDF
    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated
    corecore