65 research outputs found
Size does not matter: a molecular insight into the biological activity of chemical fragments utilizing computational approaches.
Masters Degree. University of KwaZulu-Natal, Durban.Insight into the functional and physiological state of a drug target is of essential importance in the drug discovery process, with the lack of emerging (3D) drug targets we propose the integration of homology modeling which may aid in the accurate yet efficient construction of 3D protein structures. In this study we present the applications of homology modeling in drug discovery, a conclusive route map and detailed technical guideline that can be utilised to obtain the most accurate model. Even with the presence of available drug targets and substantial advancements being made in the field of drug discovery, the prevalence of incurable diseases still remains at an all-time high. In this study we explore the biological activity of chemically derived fragments from natural products utilising a range of computational approaches and implement its use in a new route towards innovative drug discovery. A potential avenue referred to as the reduce to maximum concept recently proposed by organic chemists, entails reducing the size of a chemical compound to obtain a structural analogs with retained or enhanced biological activity, better synthetic approachability and reduced toxicity. Displaying that size may not in fact matter. Molecular dynamic simulations along with toxicity profiling were comparatively performed, on natural compound Anguinomycin D and its derived analog SB 640 each in complex with the CRM1 protein which plays an avid role in cancer pathogenesis. Each system was post-dynamically studied to comprehend structural dynamics adopted by the parent compound to that exhibited by the analog. Although being reduced by 60% the analog SB 640 displayed an overall exhibition of attractive pharmacophore properties which include minimal reduction in binding affinity, enhanced synthetic approachability and reduced toxicity in comparison to the parent compound. Potent inhibitor of CRM1, Leptomycin B (LMB) displayed substantial inhibition of the CRM1 export protein by binding to four of the PKIαNES residues (Ï0, Ï1, Ï2, Ï3, and Ï4) present within the hydrophobic binding groove of CRM1. Although being drastically reduced in size and lacking the presence of the polyketide chain present in the parent compound Anguinomycin D and LMB the analog SB 640 displaced three of these essential NES residues. The potential therapeutic activity of the structural analog remains undeniable, however the application of this approach in drug design still remains ambiguous as to which chemical fragments must be retained or truncated to ensure retention or enhanced pharmacophore properties. In this study we aimed to the use of thermodynamic calculations, which was accomplished by incorporating a MM/GBSA per-residue energy contribution footprint from molecular dynamics simulation. The proposed approach was generated for each system. Anguinomycin D and analog SB 640 each in complex with CRM1 protein, each system formed interactions with the conserved active site residues Leu 536, Thr 575, Val 576 and Lys 579. These residues were highlighted as the most energetically favourable amino acid residues contributing substantially to the total binding free energy. Thus implying a conserved selectivity and binding mode adopted by both compounds despite the omission of the prominent polyketide chain in the analog SB 640, present in the parent compound. A strategic computational approach presented in this study could serve as a beneficial tool to enhance novel drug discovery. This entire work provides an invaluable contribution to the understanding of the phenomena underlying the reduction in the size of a chemical compound to obtain the most beneficial pharmacokinetic properties and could largely contribute to the design of potent analog inhibitors for a range of drug targets implicated in the orchestration of diseases
Analysis, design and "in silico" evaluation of e-selectin antagonists
E-selectin, is member of a family of cell-adhesion proteins, which plays a crucial role in many physiological processes and diseases [1], and in particular, in the early phases of the inflammatory response. Its role is to promote the tethering and the rolling of leukocytes along the endothelial surface [2]. These steps are then followed by integrin-mediated firm adhesion and final transendothelial migration. Therefore, control of the leukocyte-endothelial cell adhesion process may be useful in cases, where excessive recruitment of leukocytes can contribute to acute or chronic diseases such as stroke, reperfusion injury, psoriasis or rheumatoid arthritis [3]. In this work, efforts to develop in silico-based protocols to study the interaction between E-selectin and its ligands, are presented. Hence, different protocols had to be developed and validated. In particular, a new procedure for the analysis of the conformational preferences of E-selectin antagonists was established and the results compared to those obtained with the MC(JBW)/SD approach, which had already demonstrated its validity in the past [161,168]. Thus, the comparison between the two protocols permitted to recognize a different conformational preference of the two methods for the orientation of the sialic acid moiety of sLex (3) (torsions Ί3 and Κ3, Figure A), which reflects the contrasting opinions existing for the conformation adopted by sLex (3) in solution [150â168]. A more detailed analysis revealed that probably both approaches deliver only a partially correct view and that in reality, in solution, sLex (3) exists as a mixture of low energy conformers and not as supposed to date [150â154,161â163] as a population of a single conformer.
In addition, a docking routine was established and the impact of different partialcharge
methods and of explicit solvation on the binding mode studied.
MD simulations enabled to gain an insight into the dynamical character of the
protein-ligand interactions. In particular, the observations done in an atomic-force
microscopy study [350], describing the interactions between the carboxylic group
of sLex and Arg97, and between the 3â and 4âhydroxyls of fucose and the
calcium ion, as the two main energy barriers for the dissociation process of the
protein-ligand complex, found confirmation in our MD-investigations. Thus, these
two contacts always lasted longer than any other in the MD simulation.
QSAR-models with Quasar [270â272,351] and Raptor [315,316,335] were
successfully derived and will permit a semi-quantitative in silico estimation of the
binding affinity for the ligands that will be designed in the future.
Finally, the developed protocols and models were applied for the development of
new E-selectin antagonists. Unfortunately, to date, only few biological data is
available to evaluate our design strategies. However, the impact of the ligandâs
pre-organization on the binding affinity could be established at least for the Lexcore
of sLex (3). Hence, the importance of the exo-anomeric effect, of the steric
compression, and of the hydrophobic interaction between the methyl group of
fucose and the ÎČ-face of galactose was clearly demonstrated
Analysis, design and "in silico" evaluation of e-selectin antagonists
E-selectin, is member of a family of cell-adhesion proteins, which plays a crucial role in many physiological processes and diseases [1], and in particular, in the early phases of the inflammatory response. Its role is to promote the tethering and the rolling of leukocytes along the endothelial surface [2]. These steps are then followed by integrin-mediated firm adhesion and final transendothelial migration. Therefore, control of the leukocyte-endothelial cell adhesion process may be useful in cases, where excessive recruitment of leukocytes can contribute to acute or chronic diseases such as stroke, reperfusion injury, psoriasis or rheumatoid arthritis [3]. In this work, efforts to develop in silico-based protocols to study the interaction between E-selectin and its ligands, are presented. Hence, different protocols had to be developed and validated. In particular, a new procedure for the analysis of the conformational preferences of E-selectin antagonists was established and the results compared to those obtained with the MC(JBW)/SD approach, which had already demonstrated its validity in the past [161,168]. Thus, the comparison between the two protocols permitted to recognize a different conformational preference of the two methods for the orientation of the sialic acid moiety of sLex (3) (torsions Ί3 and Κ3, Figure A), which reflects the contrasting opinions existing for the conformation adopted by sLex (3) in solution [150â168]. A more detailed analysis revealed that probably both approaches deliver only a partially correct view and that in reality, in solution, sLex (3) exists as a mixture of low energy conformers and not as supposed to date [150â154,161â163] as a population of a single conformer.
In addition, a docking routine was established and the impact of different partialcharge
methods and of explicit solvation on the binding mode studied.
MD simulations enabled to gain an insight into the dynamical character of the
protein-ligand interactions. In particular, the observations done in an atomic-force
microscopy study [350], describing the interactions between the carboxylic group
of sLex and Arg97, and between the 3â and 4âhydroxyls of fucose and the
calcium ion, as the two main energy barriers for the dissociation process of the
protein-ligand complex, found confirmation in our MD-investigations. Thus, these
two contacts always lasted longer than any other in the MD simulation.
QSAR-models with Quasar [270â272,351] and Raptor [315,316,335] were
successfully derived and will permit a semi-quantitative in silico estimation of the
binding affinity for the ligands that will be designed in the future.
Finally, the developed protocols and models were applied for the development of
new E-selectin antagonists. Unfortunately, to date, only few biological data is
available to evaluate our design strategies. However, the impact of the ligandâs
pre-organization on the binding affinity could be established at least for the Lexcore
of sLex (3). Hence, the importance of the exo-anomeric effect, of the steric
compression, and of the hydrophobic interaction between the methyl group of
fucose and the ÎČ-face of galactose was clearly demonstrated
Identifying prospective inhibitors against LdtMt5 from Mycobacterium tuberculosis as a potential drug target.
Masters Degree. University of KwaZulu-Natal, Durban.Tuberculosis (TB) caused by the bacterium, Mycobacterium tuberculosis (M.tb) has resulted in an unprecedented number of deaths over centuries. L,D-transpeptidase enzymes are known to play a crucial role in the biosynthesis of the cell wall, which confers resistance to most antibiotics. These enzymes catalyze the 3â3 peptidoglycan cross-links of the M.tb cell wall. Specific ÎČ-lactam antibiotics (carbapenems) have been reported to inhibit cell wall polymerization of M.tb and they inactivate L,D-transpeptidases through acylation. L,Dtranspeptidase 5 (LdtMt5) is a unique paralog and a vital protein in maintaining integrity of the cell wall specifically in peptidoglycan metabolism therefore making it an important protein target. Carbapenems inhibit LdtMt2, but do not show reasonable inhibitory activities against LdtMt5. We therefore sought to perform virtual screening in order to acquire potential inhibitors against LdtMt5 and to investigate the affinity and to calculate the binding free energies between LdtMt5 and potential inhibitors. Furthermore, we sought to investigate the nature of the transition state involved in the catalytic reaction mechanism; to determine the activation free energies of the mechanism using ONIOM through the thermodynamics and energetics of the reaction path and lastly to express, purify and perform inhibition studies on LdtMt5.
A total of 12766 compounds were computationally screened from the ZINC database to
identify potential leads against LdtMt5. Docking was performed using two different software
programs. Molecular dynamics (MD) simulations were subsequently performed on
compounds obtained through virtual screening. Density functional theory (DFT) calculations
were then carried out to understand the catalytic mechanism of LdtMt5 with respect to ÎČ-lactam
derivatives using a hybrid ONIOM quantum mechanics/molecular mechanics (QM/MM)
method. LdtMt5 complexes with six selected ÎČ-lactam compounds were evaluated. Finally, a
lyophilised pET28a-LdtMt5 was used to transform E. coli strain BL21 (DE3) and SDS-PAGE
was used to verify the purity, molecular weight and protein profile determination. Finally, an
in vitro binding thermodynamics analysis using isothermal titration calorimetry (ITC) was later
on performed on a single compound (the strongest binder) from the final set, in a bid to further
validate the calculated binding energy values.
A number of compounds from four different antimicrobial classes (n = 98) were obtained from
the virtual screening and those with docking scores ranging from -7.2 to -9.9 kcal mol-1 were
considered for MD analysis (n = 37). A final set of 10 compounds which exhibited the greatest
affinity, from four antibiotic classes was selected and Molecular Mechanics/Generalized Born
iii
Surface Area (MM-GBSA) binding free energies (ÎGbind) from the set were characterised. The
calculated binding free energies ranged from -30.68 to -48.52 kcal mol-1
. The ÎČ-lactam class
of compounds demonstrated the highest ÎGbind and also the greatest number of potential
inhibitors. The DFT activation energies (âG
#
) obtained for the acylation of LdtMt5 by the six
selected ÎČ-lactams were calculated as 13.67, 20.90, 22.88, 24.29, 27.86 and 28.26 kcal mol-1
.
The âG# results from the 6-membered ring transition state (TS) revealed that all selected six ÎČlactams were thermodynamically more favourable than previously calculated activation energy
values for imipenem and meropenem complexed with LdtMt5. The results are also comparable
to those observed for LdtMt2, however for compound 1 the values are considerably lower than
those obtained for meropenem and imipenem in complex with LdtMt2, thus suggesting in theory
that compound 1 is a more potent inhibitor of LdtMt5. We also report the successful expression
and and purification of LdtMt5, however the molecule selected for the in vitro inhibition study
gave a poor result. On further review, we concluded that the main cause of this outcome was
due to the relatively low insolubility of the compound.
The outcome of this study provides insight into the design of potential novel leads for LdtMt5.
Our screening obtained ten novel compounds from four different antimicrobial classes. We
suggest that further in vitro binding thermodynamics analysis of the novel compounds from
the four classes, including the carbapenems be performed to evaluate inhibition of these
compounds on LdtMt5. If the experimental observations suggest binding affinity to the protein,
catalytic mechanistic studies can be undertaken. These results will also be used to verify or
modify our computational model
Novel Strategies for Model-Building of G Protein-Coupled Receptors
The G protein-coupled receptors constitute still the most densely populated proteinfamily encompassing numerous disease-relevant drug targets. Consequently, medicinal chemistry is expected to pursue targets from that protein family in that hits need to be generated and subsequently optimized towards viable clinical candidates for a variety of therapeutic areas. For the purpose of rationalizing structure-activity relationships within such optimization programs, structural information derived from the ligand's as well as the macromolecule's perspective is essential. While it is relatively straightforward to define pharmacophore hypotheses based on comparative modelling of structurally and biologically characterized low-molecular weight ligands, a deeper understanding of the molecular recognition event underlying, remains challenging, since the principally available amount of experimentally derived structural data on GPCRs is extremely scarse when compared to, e.g., soluble enzymes.
In this context, the protein modelling methodologies introduced, developed, optimized, and applied in this thesis provide structural models that are capable of assisting in the development of structural hypotheses on ligand-receptor complexes. As such they provide a valuable structural framework not only for a more detailed insight into ligand-GPCR interaction, but also for guiding the design process towards next-generation compounds which should display enhanced affinity.
The model building procedure developed in this thesis systematically follows a hierarchical approach, sequentially generating a 1D topology, followed by a 2D topology that is finally converted into a 3D topology. The determination of a 1D topology is based on a compartmentalization of the linear amino acid sequence of a GPCR of interest into the extracellular, intracellular, and transmembrane sequence stretches. The entire chapter 3 of this study elaborates on the strengths and weaknesses of applying automated prediction tools for the purpose of identifying the transmembrane sequence domains. Based on an once derived 1D topology, a type of in-plane projection structure for the seven transmembrane helices can be derived with the aide of calculated vectorial property moments, yielding the 2D topology. Thorough bioinformatics studies revealed that only a consensus approach based on a conceptual combination of different methods employing a carefully made selection of parameter sets gave reliable results, emphasizing the danger to fully automate a GPCR modelling procedure.
Chapter 4 describes a procedure to further expand the 2D topological findings into 3D space, exemplified on the human CCK-B receptor protein. This particular GPCR was chosen as the receptor of interest, since an enormous experimentally derived and structurally relevant data-set was available. Within the computational refinement procedure of constructed GPCR models, major emphasis was laid on the explicit treatment of a non-isotropic solvent environment during molecular mechanics (i.e. energy minimization and molecular dynamics simulations) calculations. The majority of simulations was therefore carried out in a tri-phasic solvent box accounting for a central lipid environment, flanked by two aqueous compartments, mimicking the extracellular and cytoplasmic space.
Chapter 5 introduces the reference compound set, comprising low-molecular weight compounds modulating CCK receptors, that was used for validation purposes of the generated models of the receptor protein.
Chapter 6 describes how the generated model of the CCK-B receptor was subjected to intensive docking studies employing compound series introduced in chapter 5. It turned out that by applying the DRAGHOME methodology viable structural hypotheses on putative receptor-ligand complexes could be generated. Based on the methodology pursued in this thesis a detailed model of the receptor binding site could be devised that accounts for known structure-activity relationships as well as for results obtained by site-directed mutagenesis studies in a qualitative manner.
The overall study presented in this thesis is primarily aimed to deliver a feasibility study on generating model structures of GPCRs by a conceptual combination of tailor-made bioinformatics techniques with the toolbox of protein modelling, exemplified on the human CCK-B receptor.
The generated structures should be envisioned as models only, not necessarily providing a detailed image of reality. However, consistent models, when verified and refined against experimental data, deliver an extremely useful structural contextual platform on which different scientific disciplines such as medicinal chemistry, molecular biology, and biophysics can effectively communicate
Small molecule inhibitors of protein-protein interactions
The development of orally bioavailable small molecule drugs targeting protein-protein interactions (PPIs) has been challenging1. Unlike conventional targets, PPIsâ extended, open surface makes it difficult for small molecules to bind. In order to achieve strong binding, it is frequently necessary to use larger molecules, which traditionally is considered to disfavor druglikeness2. However, PPIs possess great therapeutic potential due to their abundance and regulatory roles in cells3. More extensive studies are needed to identify larger chemotypes that retain good druglike properties and therefore might have utility against PPI targets.
NF-ÎșB Essential Modulator (NEMO), interacting with IÎșB Kinase subunit ÎČ (IKKÎČ), is an important PPI target because of its regulatory role in NF-ÎșB signaling4. Literature suggests that the N-terminal domain of NEMO is intrinsically disordered in the absence of bound ligand5. To test this hypothesis, I developed variants of the NEMO N-terminal domain, and studied their secondary structure, stability, and affinity for IKKÎČ, showing that the N-terminal domain of NEMO is intrinsically structured (Chapter Two). I also characterized partially peptidic NEMO inhibitors from our collaborator, Carmot Therapeutics. We tested the binding of these compounds and their peptidic fragments to full-length NEMO using fluorescence anisotropy (FA)6 and surface plasmon resonance (SPR). The results provided information about hit validity, binding affinity and kinetics (Chapter Three). Macrocycles are of interest for inhibiting PPIs partly because of their proposed good membrane permeability7. To evaluate this hypothesis, I implemented a membrane permeability assay, tested the permeability of a set of macrocyclic compounds, and used the results to develop a multiple linear regression model to predict permeability from macrocyclesâ physicochemical properties. The model suggests that hydrophobicity correlates positively with good permeability, while high polarity or high aromatic ring count renders macrocycles less permeable (Chapter Four). Finally, in a separate project, to elucidate the origins of protein-ligand binding energy between interleukin-2 (IL-2) and its known small molecule inhibitors8, I developed a SPR based binding assay, and validated it by showing that the KD value of known inhibitor Ro26-45508 agrees with the literature value (Chapter Five). The assay will be useful in future studies of IL-2 inhibitors and their fragments
Kinetic and Thermodynamic Characterization of the Bacterial Lectin FimH
One fundamental aim of drug discovery is the development of new molecular entities that have a considerably advantage over already existing therapies. Urinary tract infections (UTIs) urgently require an alternative to the conventional antibiotic therapy as resistance rates for antibiotics are increasing. The development of an anti-adhesive UTI treatment strategy with the bacterial lectin FimH as target is a promising approach to remedy such alarming tendencies. FimH is presented by uropathogenic E. coli (UPEC) strains on the tip of type 1 pili and mediates adhesion to mannosylated residues on the urothelium. This interaction prevents the clearance of UPECs during micturition and enables internalization of the pathogens by urothelial cells. Mannoside-derived FimH antagonists are under development and are considered as promising treatment option for UTIs. In contrast to antibiotics, FimH antagonists do not necessarily exert resistance mechanisms against drugs because they block the adhesion of bacteria to the urothelium without killing them or inhibiting their growth.
________
In the present thesis, FimH and its interaction with mannose-based antagonists were biophysically characterized. Additionally, new methodical approaches are introduced, which are relevant not only for a strategic development of FimH antagonists but also for drugs of other therapeutic areas. The following aspects were investigated:
________
Publication 2: The publication âKinITC â One method supports both thermo-dynamic and kinetic SARsâ (Chemistry, 2018,24(49), 13049-13057) comments on kinITC-ETC, a new method based on ITC data to reveal the kinetic fingerprint of a drugâtarget interaction. In this study, kinITC-ETC was independently validated for the first time. Moreover, structural properties of FimH antagonists could be correlated with kinetic parameters of FimHâantagonist interactions.
________
Manuscript 1: The development of an off-rate screening approach is presented in the study âOff-rate screening by surface plasmon resonance â The search for promising lead structures targeting low-affinity FimHâ. The method is subsequently applied to screen a mannose-based compound library against full-length FimH. The assay allows classification of structurally diverse FimH antagonist in order to spot chemical classes exhibiting long dissociative half-lives.
________
Publication 3: The lectin domain is conformationally rigid and needs the pilin domain for allosteric propagation. However, the crosstalk between allosteric sites within the lectin domain takes also place in the absence of the pilin domain as demonstrated in the publication âConformational switch of the bacterial adhesin FimH in the absence of the regulatory domain â Engineering a minimalistic allosteric systemâ (J. Biol. Chem., 2018, 293(5), 1835-1849). Mutants of the isolated lectin domain, FimHLD R60P and V27C/L34C, exhibited a low-affinity state and mimic full-length FimH regarding its conformational transition upon mannoside binding.
________
Publication 4: The publication âTarget-directed dynamic combinatorial chemistry: A study on potentials and pitfalls as exemplified on a bacterial targetâ (Chemistry, 2017, 23, 11570-11577) illustrates a target-directed dynamic combinatorial chemistry (tdDCC) approach employing reversible acylhydrazone formation with FimH full-length as target. Optimal sample preparation and data procession are discussed in detail. Finally, the results of the tdDCC assay were subsequently compared with the affinity of library constituents by SPR.
________
Publication 5: In the publication âComparison of affinity ranking by target-directed dynamic combinatorial chemistry and surface plasmon resonanceâ larger FimH antagonist libraries were screened using the tdDCC method established in publication 3. The comparison of amplification rates of library substituents with respective binding affinities determined by SPR revealed a linear association. Furthermore, the hazardous acylhydrazone moiety could be replaced by various bioisosteres without changing the affinity of the parent compound.
________
Manuscript 2: The hydrogen bond network formed between mannose derivates and the CRD of FimH is extensively elucidated in the manuscript âHigh-affinity carbohydrateâlectin interaction: How nature makes it possibleâ. Computational methods and structural prediction in combination with binding data revealed that the hydrogen bond network forms a unified whole. The removal of only a single hydroxyl group leads to a disruption of the cooperative interplay within the network and consequently results in a dramatic loss in binding affinity.
________
Manuscript 3: In the study âThe tyrosine gate of the bacterial adhesion FimH â An evolutionary remnant paves the way for drug discoveryâ, ITC measurements demonstrated the influence of the tyrosine gate on binding affinity between FimH and natural ligands. While the tyrosine gate is exploited to form optimal hydrophobic interactions with aryl aglycones of synthetic FimH antagonists in order to increase their binding affinity, the tyrosine gate has only a marginal impact on the KD of natural ligands. In contrast to wild-type FimH, mutants that partially or completely lack the tyrosine gate exhibited a comparable binding affinity to dimannoside.
________
Publication 6: The publication âImprovement of aglycone Ï-stacking yields nanomolar to sub-nanomolar FimH antagonistsâ displays that fluorination of biphenyl mannosides further improved Ï-Ï stacking with the tyrosine gate, reaching nanomolar affinities with FimHFL and even picomolar affinities with FimHLD. It also could be shown that ligand binding to FimHFL occurs with a highly favorable enthalpic and a considerably unfavorable entropic contribution.
________
Publication 7: In the publication âEnhancing the enthalpic contribution of hydrogen bonds by solvent shieldingâ microcalorimetric studies of FimH could reveal that conformational adaptions of the binding site can establish a solvent-free cavity. Shielding the solvent results in a lower dielectric environment, in which the formation of hydrogen bonds has a considerable enthalpic contribution to the binding free energy. In the case of FimH approximately -13 kJ mol-1 for mannoside binding
Computational tools for the study of the structure-property relationship and design of new biologically active compounds
The aim of this PhD course was to explore a broad overview on the topic of the Structure-Property Relationship (SPR) with a strong emphasis on the pratical aspects. Data in chemical research, and in particular in drug discovery, is varied and oftentimes very complex. In drug discovery one has to make sense of different type of data such as structural, biological, physico-chemical, pharmacological, toxicological and so on, which, ultimately have to be associated to a single molecular structure. In order to sort out these data and extract appropriate information, a number of tools have been devised on computers and workstations in the form of different programs; the reader will find that many of these tools and methods have been used during this PhD course. More in details in Chapter 1 the homology modeling of the adenosine receptors was explored and accompanied to the pharmacophoric analysis and synthesis of new compounds. In Chapter 2 the analysis of the MMP-inhibitor interaction led us to implement the Amber Forcefield, and the following docking analysis allowed the design of new selective inhibitors. The modeling of the activate form of the cannabinoid receptors (Chapter 3) corresponded to an attempt for going away from the homology modeling procedures; together with the goal of obtaining a quantitative model from an automated docking study. In Chapter 4 the study of ligand-estrogen receptor interaction was developed exploring the free energy calculation, while finally in the last Chapter the angiotensin receptor AT1 construction led us to propose a new binding orientation for the non-peptide antagonists, using the 3D-QSAR approach as validation and predictive method
Recommended from our members
Systematically Mapping the Epigenetic Context Dependence of Transcription Factor Binding
At the core of gene regulatory networks are transcription factors (TFs) that recognize specific DNA sequences and target distinct gene sets. Characterizing the DNA binding specificity of all TFs is a prerequisite for understanding global gene regulatory logic, which in recent years has resulted in the development of high-throughput methods that probe TF specificity in vitro and are now routinely used to inform or interpret in vivo studies. Despite the broad success of such methods, several challenges remain, two of which are addressed in this thesis.
Genomic DNA can harbor different epigenetic marks that have the potential to alter TF binding, the most prominent being CpG methylation. Given the vast number of modified CpGs in the human genome and an increasing body of literature suggesting a link between epigenetic changes and genome instability, or the onset of disease such as cancer, methods that can characterize the sensitivity of TFs to DNA methylation are needed to mechanistically interpret its impact on gene expression. We developed a high-throughput in vitro method (EpiSELEX-seq) that probes TF binding to unmodified and modified DNA sequences in competition, resulting in high-resolution maps of TF binding preferences. We found that methylation sensitivity can vary between TFs of the the same structural family and is dependent on the position of the 5mCpG within the TF binding site. The importance of our in vitro profiling of methylation sensitivity is demonstrated by the preference of human p53 tetramers for 5mCpGs within its binding site core. This previously unknown, stabilizing effect is also detectable in p53 ChIP-seq data when comparing methylated and unmethylated sites genome-wide.
A second impediment to predicting TF binding is our limited understanding of i) how cooperative participation of a TF in different complexes can alter their binding preference, and ii) how the detailed shape of DNA aids in creating a substrate for adaptive multi-TF binding. To address these questions in detail, we studied the in vitro binding preferences of three D. melanogaster homeodomain TFs: Homothorax (Hth), Extradenticle(Exd) and one of the eight Hox proteins. In vivo, Hth occurs in two splice forms: with (HthFL) and without (HthHM) the DNA binding domain (DBD). HthHM-Exd itself is a Hox cofactor that has been shown to induce latent sequence specificity upon complex formation with Hox proteins. There are three possible complexes that can be formed, all potentially having specific target genes: HthHM-Exd-Hox, HthFL-Exd-Hox, and HthFL-Exd. We characterized the in vitro binding preferences of each of these by developing new computational approaches to analyze high-throughput SELEX-seq data. We found distinct orientation and spacing preference for HthFL-Exd-Hox, alternative recognition modes that depend on the affinity class a sequence falls into, and a strong preference for a narrow DNA minor grove near Exd's N-terminal DBD. Strikingly, this shape readout is crucial to stabilize the HthHM-Exd-Hox complex in the absence of a Hth DBD and can thus be used to distinguish HthHM from HthFL isoform binding. Mutating the amino acids responsible for the shape readout by Exd and reinserting the engineered protein into the fly genome allowed us to classify in vivo binding sites based on ChIP-seq signal comparison between âshape-mutantâ and wild-type Exd.
In summary, the research presented here has investigated TF binding preferences beyond sequence context by combining novel high-throughput experimental and computational methods. This interdisciplinary approach has enabled us to study binding preferences of TF complexes with respect to the epigenetic landscape of their cognate binding sites. Our novel mechanistic insights into DNA shape readout have provided a new avenue of exploiting guided protein engineering to probe how specific TFs interact with their co-factors in a cellular context, and how flanking genomic sequence helps determine which multi-TF complexes will form and which binding mode a complex adopts
- âŠ