111 research outputs found

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

    ArrayPlex: distributed, interactive and programmatic access to genome sequence, annotation, ontology, and analytical toolsets

    Get PDF
    ArrayPlex is a software package that centrally provides a large number of flexible toolsets useful for functional genomics

    Chd1 co-localizes with early transcription elongation factors independently of H3K36 methylation and releases stalled RNA polymerase II at introns

    Get PDF
    BACKGROUND: Chromatin consists of ordered nucleosomal arrays that are controlled by highly conserved adenosine triphosphate (ATP)-dependent chromatin remodeling complexes. One such remodeler, chromodomain helicase DNA binding protein 1 (Chd1), is believed to play an integral role in nucleosomal organization, as the loss of Chd1 is known to disrupt chromatin. However, the specificity and basis for the functional and physical localization of Chd1 on chromatin remains largely unknown. RESULTS: Using genome-wide approaches, we found that the loss of Chd1 significantly disrupted nucleosome arrays within the gene bodies of highly transcribed genes. We also found that Chd1 is physically recruited to gene bodies, and that its occupancy specifically corresponds to that of the early elongating form of RNA polymerase, RNAPII Ser 5-P. Conversely, RNAPII Ser 5-P occupancy was affected by the loss of Chd1, suggesting that Chd1 is associated with early transcription elongation. Surprisingly, the occupancy of RNAPII Ser 5-P was affected by the loss of Chd1 specifically at intron-containing genes. Nucleosome turnover was also affected at these sites in the absence of Chd1. We also found that deletion of the histone methyltransferase for H3K36 (SET2) did not affect either Chd1 occupancy or nucleosome organization genome-wide. CONCLUSIONS: Chd1 is specifically recruited onto the gene bodies of highly transcribed genes in an elongation-dependent but H3K36me3-independent manner. Chd1 co-localizes with the early elongating form of RNA polymerase, and affects the occupancy of RNAPII only at genes containing introns, suggesting a role in relieving splicing-related pausing of RNAPII. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-8935-7-32) contains supplementary material, which is available to authorized users

    The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

    Get PDF
    BACKGROUND: The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. DESCRIPTION: The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at CONCLUSIONS: Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data

    Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cell lines have been used to study cancer for decades, but truly quantitative assessment of their performance as models is often lacking. We used gene expression profiling to quantitatively assess the gene expression of nine cell line models of cervical cancer.</p> <p>Results</p> <p>We find a wide variation in the extent to which different cell culture models mimic late-stage invasive cervical cancer biopsies. The lowest agreement was from monolayer HeLa cells, a common cervical cancer model; the highest agreement was from primary epithelial cells, C4-I, and C4-II cell lines. In addition, HeLa and SiHa cell lines cultured in an organotypic environment increased their correlation to cervical cancer significantly. We also find wide variation in agreement when we considered how well individual biological pathways model cervical cancer. Cell lines with an anti-correlation to cervical cancer were also identified and should be avoided.</p> <p>Conclusion</p> <p>Using gene expression profiling and quantitative analysis, we have characterized nine cell lines with respect to how well they serve as models of cervical cancer. Applying this method to individual pathways, we identified the appropriateness of particular cell lines for studying specific pathways in cervical cancer. This study will allow researchers to choose a cell line with the highest correlation to cervical cancer at a pathway level. This method is applicable to other cancers and could be used to identify the appropriate cell line and growth condition to employ when studying other cancers.</p

    Wide-ranging functions of E2F4 in transcriptional activation and repression revealed by genome-wide analysis

    Get PDF
    The E2F family of transcription factors has important roles in cell cycle progression. E2F4 is an E2F family member that has been proposed to be primarily a repressor of transcription, but the scope of its binding activity and functions in transcriptional regulation is not fully known. We used ChIP sequencing (ChIP-seq) to identify around 16 000 E2F4 binding sites which potentially regulate 7346 downstream target genes with wide-ranging functions in DNA repair, cell cycle regulation, apoptosis, and other processes. While half of all E2F4 binding sites (56%) occurred near transcription start sites (TSSs), ∼20% of sites occurred more than 20 kb away from any annotated TSS. These distal sites showed histone modifications suggesting that E2F4 may function as a long-range regulator, which we confirmed by functional experimental assays on a subset. Overexpression of E2F4 and its transcriptional cofactors of the retinoblastoma (Rb) family and its binding partner DP-1 revealed that E2F4 acts as an activator as well as a repressor. E2F4 binding sites also occurred near regulatory elements for miRNAs such as let-7a and mir-17, suggestive of regulation of miRNAs by E2F4. Taken together, our genome-wide analysis provided evidence of versatile roles of E2F4 and insights into its functions

    Mechanisms of Cell Cycle Control Revealed by a Systematic and Quantitative Overexpression Screen in S. cerevisiae

    Get PDF
    Regulation of cell cycle progression is fundamental to cell health and reproduction, and failures in this process are associated with many human diseases. Much of our knowledge of cell cycle regulators derives from loss-of-function studies. To reveal new cell cycle regulatory genes that are difficult to identify in loss-of-function studies, we performed a near-genome-wide flow cytometry assay of yeast gene overexpression-induced cell cycle delay phenotypes. We identified 108 genes whose overexpression significantly delayed the progression of the yeast cell cycle at a specific stage. Many of the genes are newly implicated in cell cycle progression, for example SKO1, RFA1, and YPR015C. The overexpression of RFA1 or YPR015C delayed the cell cycle at G2/M phases by disrupting spindle attachment to chromosomes and activating the DNA damage checkpoint, respectively. In contrast, overexpression of the transcription factor SKO1 arrests cells at G1 phase by activating the pheromone response pathway, revealing new cross-talk between osmotic sensing and mating. More generally, 92%–94% of the genes exhibit distinct phenotypes when overexpressed as compared to their corresponding deletion mutants, supporting the notion that many genes may gain functions upon overexpression. This work thus implicates new genes in cell cycle progression, complements previous screens, and lays the foundation for future experiments to define more precisely roles for these genes in cell cycle progression

    miR-503 represses human cell proliferation and directly targets the oncogene DDHD2 by non-canonical target pairing

    Get PDF
    The pathways regulating the transition of mammalian cells from quiescence to proliferation are mediated by multiple miRNAs. Despite significant improvements in our understanding of miRNA targeting, the majority of miRNA regulatory networks are still largely unknown and require experimental validation. Results: Here we identified miR-503, miR-103, and miR-494 as negative regulators of proliferation in primary human cells. We experimentally determined their genome wide target profiles using RNA-induced silencing complex (RISC) immunoprecipitations and gene expression profiling. Analysis of the genome wide target profiles revealed evidence of extensive regulation of gene expression through non-canonical target pairing by miR-503. We identified the proto-oncogene DDHD2 as a target of miR-503 that requires pairing outside of the canonical 5' seed region of miR-503, representing a novel mode of miRNA-target pairing. Further bioinformatics analysis implicated miR-503 and DDHD2 in breast cancer tumorigenesis. Conclusions: Our results provide an extensive genome wide set of targets for miR-503, miR-103, and miR-494, and suggest that miR-503 may act as a tumor suppressor in breast cancer by its direct non-canonical targeting of DDHD2.National Institutes of Health CA130075Cancer Prevention and Research Institute of Texas RP120194Cellular and Molecular Biolog

    Detection and benchmarking of somatic mutations in cancer genomes using RNA-seq data

    Get PDF
    To detect functional somatic mutations in tumor samples, whole-exome sequencing (WES) is often used for its reliability and relative low cost. RNA-seq, while generally used to measure gene expression, can potentially also be used for identification of somatic mutations. However there has been little systematic evaluation of the utility of RNA-seq for identifying somatic mutations. Here, we develop and evaluate a pipeline for processing RNA-seq data from glioblastoma multiforme (GBM) tumors in order to identify somatic mutations. The pipeline entails the use of the STAR aligner 2-pass procedure jointly with MuTect2 from genome analysis toolkit (GATK) to detect somatic variants. Variants identified from RNA-seq data were evaluated by comparison against the COSMIC and dbSNP databases, and also compared to somatic variants identified by exome sequencing. We also estimated the putative functional impact of coding variants in the most frequently mutated genes in GBM. Interestingly, variants identified by RNA-seq alone showed better representation of GBM-related mutations cataloged by COSMIC. RNA-seq-only data substantially outperformed the ability of WES to reveal potentially new somatic mutations in known GBM-related pathways, and allowed us to build a high-quality set of somatic mutations common to exome and RNA-seq calls. Using RNA-seq data in parallel with WES data to detect somatic mutations in cancer genomes can thus broaden the scope of discoveries and lend additional support to somatic variants identified by exome sequencing alone
    corecore