9,497 research outputs found

    A data mining method to predict transcriptional regulatory sites based on differentially expressed genes in human genome

    Get PDF
    [[abstract]]Very large-scale gene expression analysis, i.e., UniGene and dbEST, is provided to find those genes with significantly differential expression in specific tissues. The differentially expressed genes in a specific tissue are potentially regulated concurrently by a combination of transcription factors. This study attempts to mine putative binding sites on how combinations of the known regulatory sites homologs and over-represented repetitive elements are distributed in the promoter regions of considered groups of differentially expressed genes. We propose a data mining approach to statistically discover the significantly tissue-specific combinations of known site homologs and over-represented repetitive sequences, which are distributed in the promoter regions of differentially gene groups. The association rules mined would facilitate to predict putative regulatory elements and identify genes potentially co-regulated by the putative regulatory elements

    The Repetitive Sequence Database and Mining Putative Regulatory Elements in Gene Promoter Regions

    Get PDF
    [[abstract]]At least 43% of the human genome is occupied by repetitive elements. Moreover, around 51% of the rice genome is occupied by repetitive elements. The analysis of repetitive elements reveals that repetitive elements in our genome may have been very important in the evolutionary genomics. The ? rst part of this study is to describe a database of repetitive elements—RSDB. The RSDB database contains repetitive elements, which are classi? ed into the following categories: exact, tandem, and similar. The interfaces needed to query and show the results and statistical data, such as the relationship between repetitive elements and genes, cross-references of repetitive elements among different organisms, and so on, are provided. The second part of this study then attempts to mine the putative binding site for information on how combinations of the known regulatory sites and overrepresented repetitive elements in RSDB are distributed in the promoter regions of groups of functionally related genes. The overrepresented repetitive elements appearing in the associations are possible transcription factor binding sites. Our proposed approach is applied to Saccharomyces cerevisiae and the promoter regions of Yeast ORFs. The complete contents of RSDB and partial putative binding sites are available to the public at www.rsdb.csie.ncu.edu.tw. The readers may download partial query results

    Impact of Environmental Factors on Bacteriocin Promoter Activity in Gut-Derived Lactobacillus salivarius

    Get PDF
    peer-reviewedBacteriocin production is regarded as a desirable probiotic trait that aids in colonization and persistence in the gastrointestinal tract (GIT). Strains of Lactobacillus salivarius, a species associated with the GIT, are regarded as promising probiotic candidates and have a number of associated bacteriocins documented to date. These include multiple class IIb bacteriocins (salivaricin T, salivaricin P, and ABP-118) and the class IId bacteriocin bactofencin A, which show activity against medically important pathogens. However, the production of a bacteriocin in laboratory media does not ensure production under stressful environmental conditions, such as those encountered within the GIT. To allow this issue to be addressed, the promoter regions located upstream of the structural genes encoding the L. salivarius bacteriocins mentioned above were fused to a number of reporter proteins (green fluorescent protein [GFP], red fluorescent protein [RFP], and luciferase [Lux]). Of these, only transcriptional fusions to GFP generated signals of sufficient strength to enable the study of promoter activity in L. salivarius. While analysis of the class IIb bacteriocin promoter regions indicated relatively weak GFP expression, assessment of the promoter of the antistaphylococcal bacteriocin bactofencin A revealed a strong promoter that is most active in the absence of the antimicrobial peptide and is positively induced in the presence of mild environmental stresses, including simulated gastric fluid. Taken together, these data provide information on factors that influence bacteriocin production, which will assist in the development of strategies to optimize in vivo and in vitro production of these antimicrobials.This work was funded by a SFI PI award “Obesibiotics” (11/PI/1137) to PD

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

    Deep-coverage whole genome sequences and blood lipids among 16,324 individuals.

    Get PDF
    Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia

    Yin Yang 1 is associated with cancer stem cell transcription factors (SOX2, OCT4, BMI1) and clinical implication.

    Get PDF
    The transcription factor Yin Yang 1 (YY1) is frequently overexpressed in cancerous tissues compared to normal tissues and has regulatory roles in cell proliferation, cell viability, epithelial-mesenchymal transition, metastasis and drug/immune resistance. YY1 shares many properties with cancer stem cells (CSCs) that drive tumorigenesis, metastasis and drug resistance and are regulated by overexpression of certain transcription factors, including SOX2, OCT4 (POU5F1), BMI1 and NANOG. Based on these similarities, it was expected that YY1 expression would be associated with SOX2, OCT4, BMI1, and NANOG's expressions and activities. Data mining from the proteomic tissue-based datasets from the Human Protein Atlas were used for protein expression patterns of YY1 and the four CSC markers in 17 types of cancer, including both solid and hematological malignancies. A close association was revealed between the frequency of expressions of YY1 and SOX2 as well as SOX2 and OCT4 in all cancers analyzed. Two types of dynamics were identified based on the nature of their association, namely, inverse or direct, between YY1 and SOX2. These two dynamics define distinctive patterns of BMI1 and OCT4 expressions. The relationship between YY1 and SOX2 expressions as well as the expressions of BMI1 and OCT4 resulted in the classification of four groups of cancers with distinct molecular signatures: (1) Prostate, lung, cervical, endometrial, ovarian and glioma cancers (YY1(lo)SOX2(hi)BMI1(hi)OCT4(hi)) (2) Skin, testis and breast cancers (YY1(hi)SOX2(lo)BMI1(hi)OCT4(hi)) (3) Liver, stomach, renal, pancreatic and urothelial cancers (YY1(lo)SOX2(lo)BMI1(hi)OCT4(hi)) and (4) Colorectal cancer, lymphoma and melanoma (YY1(hi)SOX2(hi)BMI1(lo)OCT4(hi)). A regulatory loop is proposed consisting of the cross-talk between the NF-kB/PI3K/AKT pathways and the downstream inter-regulation of target gene products YY1, OCT4, SOX2 and BMI1

    The CBRB regulon: Promoter dissection reveals novel insights into the CbrAB expression network in Pseudomonas putida

    Get PDF
    CbrAB is a high ranked global regulatory system exclusive of the Pseudomonads that responds to carbon limiting conditions. It has become necessary to define the particular regulon of CbrB and discriminate it from the downstream cascades through other regulatory components. We have performed in vivo binding analysis of CbrB in P. putida and determined that it directly controls the expression of at least 61 genes; 20% involved in regulatory functions, including the previously identified CrcZ and CrcY small regulatory RNAs. The remaining are porines or transporters (20%), metabolic enzymes (16%), activities related to protein translation (5%) and orfs of uncharacterised function (38%). Amongst the later, we have selected the operon PP2810-13 to make an exhaustive analysis of the CbrB binding sequences, together with those of crcZ and crcY. We describe the implication of three independent non-palindromic subsites with a variable spacing in three different targets; CrcZ, CrcY and operon PP2810-13 in the CbrAB activation. CbrB is a quite peculiar σN—depen-dent activator since it is barely dependent on phosphorylation for transcriptional activation. With the depiction of the precise contacts of CbrB with the DNA, the analysis of the multi-merisation status and its dependence on other factors such as RpoN o IHF, we propose a model of transcriptional activation.Ministerio de EconomĂ­a y Competitividad BIO2014-57545-

    TF2Network : predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information

    Get PDF
    A gene regulatory network (GRN) is a collection of regulatory interactions between transcription factors (TFs) and their target genes. GRNs control different biological processes and have been instrumental to understand the organization and complexity of gene regulation. Although various experimental methods have been used to map GRNs in Arabidop-sis thaliana, their limited throughput combined with the large number of TFs makes that for many genes our knowledge about regulating TFs is incomplete. We introduce TF2Network, a tool that exploits the vast amount of TF binding site information and enables the delineation of GRNs by detecting potential regulators for a set of co-expressed or functionally related genes. Validation using two experimental benchmarks reveals that TF2Network predicts the correct regulator in 75-92% of the test sets. Furthermore, our tool is robust to noise in the input gene sets, has a low false discovery rate, and shows a better performance to recover correct regulators compared to other plant tools. TF2Network is accessible through a web interface where GRNs are interactively visualized and annotated with various types of experimental functional information. TF2Network was used to perform systematic functional and regulatory gene annotations, identifying new TFs involved in circadian rhythm and stress response

    Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs

    Get PDF
    BACKGROUND: Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. RESULTS: To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. CONCLUSIONS: The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcription factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.Comment: 19 pages, 1 figure. Published version with several improvements. Supplementary material available from the author

    Genetic Variation and Antioxidant Response Gene Expression in the Bronchial Airway Epithelium of Smokers at Risk for Lung Cancer

    Get PDF
    Prior microarray studies of smokers at high risk for lung cancer have demonstrated that heterogeneity in bronchial airway epithelial cell gene expression response to smoking can serve as an early diagnostic biomarker for lung cancer. As a first step in applying functional genomic analysis to population studies, we have examined the relationship between gene expression variation and genetic variation in a central molecular pathway (NRF2-mediated antioxidant response) associated with smoking exposure and lung cancer. We assessed global gene expression in histologically normal airway epithelial cells obtained at bronchoscopy from smokers who developed lung cancer (SC, n=20), smokers without lung cancer (SNC, n=24), and never smokers (NS, n=8). Functional enrichment analysis showed that the NRF2-mediated, antioxidant response element (ARE)-regulated genes, were significantly lower in SC, when compared with expression levels in SNC. Importantly, we found that the expression of MAFG (a binding partner of NRF2) was correlated with the expression of ARE genes, suggesting MAFG levels may limit target gene induction. Bioinformatically we identified single nucleotide polymorphisms (SNPs) in putative ARE genes and to test the impact of genetic variation, we genotyped these putative regulatory SNPs and other tag SNPs in selected NRF2 pathway genes. Sequencing MAFG locus, we identified 30 novel SNPs and two were associated with either gene expression or lung cancer status among smokers. This work demonstrates an analysis approach that integrates bioinformatics pathway and transcription factor binding site analysis with genotype, gene expression and disease status to identify SNPs that may be associated with individual differences in gene expression and/or cancer status in smokers. These polymorphisms might ultimately contribute to lung cancer risk via their effect on the airway gene expression response to tobacco-smoke exposure.Intramural Research Program of the National Institute of Environmental Health Sciences; National Institutes of Health (Z01 ES100475, U01ES016035, R01CA124640
    • 

    corecore