4 research outputs found

    Classes of fast and specific search mechanisms for proteins on DNA

    Full text link
    Problems of search and recognition appear over different scales in biological systems. In this review we focus on the challenges posed by interactions between proteins, in particular transcription factors, and DNA and possible mechanisms which allow for a fast and selective target location. Initially we argue that DNA-binding proteins can be classified, broadly, into three distinct classes which we illustrate using experimental data. Each class calls for a different search process and we discuss the possible application of different search mechanisms proposed over the years to each class. The main thrust of this review is a new mechanism which is based on barrier discrimination. We introduce the model and analyze in detail its consequences. It is shown that this mechanism applies to all classes of transcription factors and can lead to a fast and specific search. Moreover, it is shown that the mechanism has interesting transient features which allow for stability at the target despite rapid binding and unbinding of the transcription factor from the target.Comment: 65 pages, 23 figure

    Natural Selection Equally Supports the Human Tendencies in Subordination and Domination: A Genome-Wide Study With in silico Confirmation and in vivo Validation in Mice

    Get PDF
    We proposed the following heuristic decision-making rule: “IF {an excess of a protein relating to the nervous system is an experimentally known physiological marker of low pain sensitivity, fast postinjury recovery, or aggressive, risk/novelty-seeking, anesthetic-like, or similar agonistic-intolerant behavior} AND IF {a single nucleotide polymorphism (SNP) causes overexpression of the gene encoding this protein} THEN {this SNP can be a SNP marker of the tendency in dominance} WHILE {underexpression corresponds to subordination} AND vice versa.” Using this decision-making rule, we analyzed 231 human genes of neuropeptidergic, non-neuropeptidergic, and neurotrophinergic systems that encode neurotrophic and growth factors, interleukins, neurotransmitters, receptors, transporters, and enzymes. These proteins are known as key factors of human social behavior. We analyzed all the 5,052 SNPs within the 70 bp promoter region upstream of the position where the protein-coding transcript starts, which were retrieved from databases Ensembl and dbSNP using our previously created public Web service SNP_TATA_Comparator (http://beehive.bionet.nsc.ru/cgi-bin/mgs/tatascan/start.pl). This definition of the promoter region includes all TATA-binding protein (TBP)-binding sites. A total of 556 and 552 candidate SNP markers contributing to the dominance and the subordination, respectively, were uncovered. On this basis, we determined that 231 human genes under study are subject to natural selection against underexpression (significance p < 0.0005), which equally supports the human tendencies in domination and subordination such as the norm of a reaction (plasticity) of the human social hierarchy. These findings explain vertical transmission of domination and subordination traits previously observed in rodent models. Thus, the results of this study equally support both sides of the century-old unsettled scientific debate on whether both aggressiveness and the social hierarchy among humans are inherited (as suggested by Freud and Lorenz) or are due to non-genetic social education, when the children are influenced by older individuals across generations (as proposed by Berkowitz and Fromm)

    Computational Modelling of Human Transcriptional Regulation by an Information Theory-based Approach

    Get PDF
    ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs. This thesis presents a novel motif discovery pipeline by adding the recursive masking and thresholding functionalities to Bipad to improve detection of primary binding motifs. Analyzing 765 ENCODE ChIP-seq datasets with this pipeline generated contiguous and bipartite information theory-based PWMs (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The accuracy of these iPWMs were determined via four independent validation methods, including detection of experimentally proven TFBSs, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. Novel cofactor motifs supported previously unreported TF coregulatory interactions. This thesis further presents a unified framework to identify variants in hereditary breast and ovarian cancer (HBOC), successfully applying these iPWMs to prioritize TFBS variants in 20 complete genes of HBOC patients. The spatial distribution and information composition of cis-regulatory modules (e.g. TFBS clusters) in promoters substantially determine gene expression patterns and TF target genes. Multiple algorithms were developed to detect TFBS clusters, including the information density-based clustering (IDBC) algorithm that simultaneously considers the spatial and information densities of TFBSs. Prior studies predicting tissue-specific gene expression levels and differentially expressed (DE) TF targets used log likelihood ratios to quantify TFBS strengths and merged adjacent TFBSs into clusters. This thesis presents a machine learning framework that uses the Bray-Curtis function to quantify the similarity between tissue-wide expression profiles of genes, and IDBC-identified clusters from iPWM-detected TFBSs to predict gene expression profiles and DE direct TF targets. Multiple clusters enable gene expression to be robust against TFBS mutations
    corecore