15 research outputs found
Genome wide prediction of HNF4α functional binding sites by the use of local and global sequence context
An application of machine learning algorithms enables prediction of the functional context of transcription factor binding sites in the human genome
Genetic networks of antibacterial responses of eukaryotic cells. Bioinformatics analysis and modeling
This work describes the development of new methods to construction of promoter models as one of necessary steps of regulatory networks construction. Identification of characteristic promoter features shows the role of specific transcription factors (TFs) in triggering the response, which in turn sheds light on the signaling pathways activating these TFs. Treating reported results of microarray analyses together with other available information about the genes expressed in different cellular systems under consideration, we search for distinguishing features of the promoters of coexpressed genes. The application of such promoter models enables to identify additional candidate genes belonging to the same regulatory network. Four novel approaches are presented in this work: (i) subtractive approach to matrix generation; (ii) distance distribution approach; (iii) "seed" sets approach; (iv) complementary pairs approach. These approaches help to solve serious problems in promoter model construction such as the doubtful reliability of positive training sets ("seed" sets approach) and lack of knowledge about the exact signaling pathways triggering the gene expression (complementary pairs approach); the subtractive approach to matrix generation allows to refine positional weight matrices (PWM) for heterogeneous sets of binding sites, thus to improve the PWM search for single TFBS. A significant improvement of the specificity of promoter analysis has been achieved by applying statistical methods for characterizing TFBS combinations at over-represented distances rather than the mere identification of single potential TFBS (distance distributions approach). The newly developed methods were applied to the description of four defensive eukaryotic systems in terms of transcription regulation. The obtained models enabled us to gain better insights into the pathways of the corresponding signaling networks.Diese Arbeit beschreibt die Entwicklung mehrerer neuer Methoden zur Konstruktion von Promotormodellen als einen der notwendigen Schritte zur Konstruktion regulatorischer Netzwerke. Die Identifizierung charakteristischer Eigenschaften von Promotoren zeigt die Rolle bestimmter Transkriptionsfaktoren (TF) beim Auslösen spezifischer Antworten auf, was wiederum Aufschluss über die Signalwege zur Aktivierung dieser TF gibt. Durch Verarbeitung von Ergebnissen aus Microarray-Analysen zusammen mit weiteren verfügbaren Informationen über die in den betrachteten zellulären Systemen exprimierten Gene suchen wir nach kennzeichnenden Eigenschaften koregulierter Promotoren. Die Applikation solcher Promotermodelle ermöglicht die Identifizierung zusätzlicher Kandidatengene, die demselben regulatorischen Netzwerk angehören. Vier neue Ansätze werden in dieser Arbeit präsentiert: (i) der subtraktive Ansatz zur Matrixerzeugung; (ii) der Distanzverteilungsansatz; (iii) der "seed"-Set-Ansatz; (iv) der Ansatz komplementärer Paare. Diese Ansätze helfen, beträchtliche Probleme der Promotormodellkonstruktion zu lösen, wie die zweifelhafte Zuverlässigkeit positiver Trainingsets ("seed"-Set-Ansatz) und der Mangel an Wissen über die präzisen Signalwege, die bestimmte Genexpressionsereignisse auslösen (Ansatz komplementärer Paare). Der subtraktive Ansatz zur Matrixerzeugung erlaubt, Positionsgewichtungsmatrizen (PWM) für heterogene Sets von Bindungsstellen zu verfeinern und dadurch die PWM-Suche für einzelne TFBSs zur verbessern. Eine signifikante Verbesserung der Spezifität der Promotoranalyse wurde durch die Anwendung statistischer Methoden zur Charakterisierung von TFBS-Kombinationen in überrepräsentierten Distanzen anstelle der bloßen Identifizierung einzelner potentieller TFBSs erreicht. Die neuentwickelten Methoden wurden zur Beschreibung von vier eukaryotischen Abwehrsystemen verwendet. Die erhaltenen Modelle eröffneten tiefergehende Einsichten in die Pfade der zugehörigen Signalnetzwerke
Prediction of synergistic transcription factors by function conservation
A new strategy is proposed for identifying synergistic transcription factors by function conservation, leading to the identification of 51 homotypic transcription-factor combinations
Most transcription factor binding sites are in a few mosaic classes of the human genome
Background: Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to complement this formulation.
Results
We find that the human genome may be described by 19 pairs of mosaic classes, each defined by its base frequencies, (or more precisely by the frequencies of doublets), so that typically a run of 10 to 100 bases belongs to the same class. Most experimentally verified binding sites are in the same four pairs of classes. In our sample of seventeen transcription factors — taken from different families of transcription factors — the average proportion of sites in this subset of classes was 75%, with values for individual factors ranging from 48% to 98%. By contrast these same classes contain only 26% of the bases of the genome and only 31% of occurrences of the motifs of these factors — that is places where one might expect the factors to bind. These results are not a consequence of the class composition in promoter regions.
Conclusions:This method of analysis will help to find transcription factor binding sites and assist with the problem of false positives. These results also imply a profound difference between the mosaic classes
Recommended from our members
Beyond protein factories : expanding the synthetic biology toolkit for engineering mammalian hosts
The incredible clinical and commercial successes of recombinant protein therapeutics cemented the use of mammalian cells as the premier production hosts for these products. However, we can further exploit these cells to harness their potential for addressing current and future medical needs through metabolic and advanced engineering of these cells. To do so, we need a deeper understanding of the intricate gene regulation network that governs these cells and the ability to attain precise control of gene expression levels. In addition, some of these applications, such as gene therapy and immunotherapy, could benefit greatly by refraining from using viral-derived genetic elements. Therefore, this work seeks to establish additional transcriptional control elements to improve our ability to regulate expression with generalizable approaches and methods, facilitating the adaptation of these techniques for any mammalian cell type of interest. Here, we successfully demonstrated three key genetic elements can be utilized to tune gene expression in a rational manner. First, we conducted a genome-wide screen to survey genomic integration sites that support high transcriptional activity. We showed that CRISPR/Cas9-mediated de novo integration into one of these transcriptional hot-spots at the GRIK1 locus resulted in a 2.4-fold increase in heterologous gene expression over random integration. Subsequently, we set the groundwork necessary to evaluate a cell line development strategy that aims to increase the frequency of successful de novo targeted integrations. Second, we utilized two approaches for rational promoter engineering. We established a transcriptomics-guided workflow for de novo synthetic promoter design based on the Design-Build-Test paradigm. By using this workflow, we generated two synthetic designs that were comparable to a strong viral promoter and a strong endogenous promoter. We also employed an alternative approach by creating hybrid promoters, which resulted in a hybrid promoter variant that was also comparable to the same viral and endogenous promoters. Third, we exploited the general mammalian terminator structure and created a synthetic terminator that was comparable to a strong viral terminator. We evaluated 12 endogenous and 30 synthetic terminators for heterologous gene expression and revealed interactions between several key components of the terminator. Critically, we showed that transgene expression was 1.9x higher with endogenous and synthetic elements when compared with strong viral-derived elements. Ultimately, we showed that transgene expression can be finely adjusted by the approaches and methods described in this dissertation, and that viral-derived elements can be readily substituted by our synthetic designs.Chemical Engineerin