7 research outputs found

    Recognition of interferon-inducible sites, promoters, and enhancers

    Get PDF
    BACKGROUND: Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-κB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions. RESULTS: We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-κB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible. CONCLUSION: Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500–2000 in the human genome

    Recognition of interferon-inducible sites, promoters, and enhancers

    No full text
    Abstract Background Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-κB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions. Results We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-κB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible. Conclusion Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500–2000 in the human genome.</p

    Assessment of transcriptional importance of cell line-specific features based on GTRD and FANTOM5 data.

    No full text
    Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS

    Population size estimation for quality control of ChIP-Seq datasets.

    No full text
    Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html
    corecore