1,163 research outputs found

    Genomic Methods for Studying the Post-Translational Regulation of Transcription Factors

    Get PDF
    The spatiotemporal coordination of gene expression is a fundamental process in cellular biology. Gene expression is regulated, in large part, by sequence-specific transcription factors that bind to DNA regions in the proximity of each target gene. Transcription factor activity and specificity are, in turn, regulated post-translationally by protein-modifying enzymes. High-throughput methods exist to probe specific steps of this process, such as protein-protein and protein-DNA interactions, but few computational tools exist to integrate this information in a principled, model-oriented manner. In this work, I develop several computational tools for studying the functional implications of transcription factor modification. I establish the first publicly accessible database for known and predicted regulatory circuits that encompass modifying enzymes, transcription factors, and transcriptional targets. I also develop a model-based method for integrating heterogeneous genomic and proteomic data for the inference of modification-dependent transcriptional regulatory networks. The model-based method is thoroughly validated as a reliable and accurate computational genomic tool. Additionally, I propose and demonstrate fundamental improvements to computational proteomic methods for identifying modified protein forms. In summary, this work contributes critical methodological advances to the field of regulatory network inference

    PPARα siRNA–Treated Expression Profiles Uncover the Causal Sufficiency Network for Compound-Induced Liver Hypertrophy

    Get PDF
    Uncovering pathways underlying drug-induced toxicity is a fundamental objective in the field of toxicogenomics. Developing mechanism-based toxicity biomarkers requires the identification of such novel pathways and the order of their sufficiency in causing a phenotypic response. Genome-wide RNA interference (RNAi) phenotypic screening has emerged as an effective tool in unveiling the genes essential for specific cellular functions and biological activities. However, eliciting the relative contribution of and sufficiency relationships among the genes identified remains challenging. In the rodent, the most widely used animal model in preclinical studies, it is unrealistic to exhaustively examine all potential interactions by RNAi screening. Application of existing computational approaches to infer regulatory networks with biological outcomes in the rodent is limited by the requirements for a large number of targeted permutations. Therefore, we developed a two-step relay method that requires only one targeted perturbation for genome-wide de novo pathway discovery. Using expression profiles in response to small interfering RNAs (siRNAs) against the gene for peroxisome proliferator-activated receptor α (Ppara), our method unveiled the potential causal sufficiency order network for liver hypertrophy in the rodent. The validity of the inferred 16 causal transcripts or 15 known genes for PPARα-induced liver hypertrophy is supported by their ability to predict non-PPARα–induced liver hypertrophy with 84% sensitivity and 76% specificity. Simulation shows that the probability of achieving such predictive accuracy without the inferred causal relationship is exceedingly small (p < 0.005). Five of the most sufficient causal genes have been previously disrupted in mouse models; the resulting phenotypic changes in the liver support the inferred causal roles in liver hypertrophy. Our results demonstrate the feasibility of defining pathways mediating drug-induced toxicity from siRNA-treated expression profiles. When combined with phenotypic evaluation, our approach should help to unleash the full potential of siRNAs in systematically unveiling the molecular mechanism of biological events

    Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network

    Get PDF
    One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well

    Identification of Long-Range Regulatory Elements in the Human Genome

    Get PDF
    Genome-wide association studies have shown that the majority of disease-associated genetic variants lie within non-coding regions of the human genome. Subsequently, a challenge following these discoveries is to identify how these variants modulate the risk of disease. Enhancers are non-coding regulatory elements that can be bound by proteins to activate the expression of a gene that may be linearly distant. Experimentally probing all possible enhancer–target gene pairs can be laborious. Hi-C, a technique developed by Job Dekker’s group in 2009, combines high-throughput sequencing with chromosome conformation capture to detect DNA interactions genome-wide and thereby reveals the three-dimensional architecture of chromatin in the nucleus. However, the utility of the datasets produced by this technique for discovering long-range regulatory interactions is largely unexplored. In this thesis, we develop novel approaches to identify DNA-interacting units and their interactions in Hi-C datasets with the goal of uncovering all enhancer–target gene interactions. We began by identifying significantly interacting regions in these datasets, subsequently focusing on candidate enhancer–gene pairs. We found that the identified putative enhancers are enriched for p300 binding activity, while their target promoters are likely to be cell-type-specific. Furthermore, we revealed that enhancers and target genes often interact in many-to-many relationships and the majority of enhancer–target gene interactions are intra-chromosomal and within 1 Mb of each other. Next, we refined our analytical approach to identify physically-interacting DNA regions at ~1 kb resolution and better define the boundaries of likely enhancer elements. By searching for over-represented sequences (motifs) in these putative promoter-interacting enhancers, we were then able to identify bound transcription factors. This newer approach provides the potential to identify protein complexes involved in enhancer–promoter interactions, which can be verified in future experiments. We implemented a high-throughput identification pipeline for promoter-interacting enhancer elements (HIPPIE) using both of the above described approaches. HIPPIE can be run efficiently on typical Linux servers and grid computing environments and is available as open-source software. In summary, our findings demonstrate the potential utility of Hi-C technologies for elucidating the mechanisms by which long-range enhancers regulate gene expression and ultimately result in human disease phenotypes

    The Evolution and Mechanics of Translational Control in Plants

    Get PDF
    The expression of numerous plant mRNAs is attenuated by RNA sequence elements located in the 5\u27 and 3\u27 untranslated regions (UTRs). For example, in plants and many higher eukaryotes, roughly 35% of genes encode mRNAs that contain one or more upstream open reading frames (uORFs) in the 5\u27 UTR. For this dissertation I have analyzed the pattern of conservation of such mRNA sequence elements. In the first set of studies, I have taken a comparative transcriptomics approach to address which RNA sequence elements are conserved between various families of angiosperm plants. Such conservation indicates an element\u27s fundamental importance to plant biology, points to pathways for which it is most vital, and suggests the mechanism by which it acts. Conserved motifs were detected in 3% of genes. These include di-purine repeat motifs, uORF-associated motifs, putative binding sites for PUMILIO-like RNA binding proteins, small RNA targets, and a wide range of other sequence motifs. Due to the scanning process that precedes translation initiation, uORFs are often translated, thereby repressing initiation at the an mRNA\u27s main ORF. As one might predict, I found a clear bias against the AUG start codon within the 5\u27 untranslated region (5\u27 UTR) among all plants examined. Further supporting this finding, comparative analysis indicates that, for ~42% of genes, AUGs and their resultant uORFs reduce carrier fitness. Interestingly, for at least 5% of genes, uORFs are not only tolerated, but enriched. The remaining uORFs appear to be neutral. Because of their tangible impact on plant biology, it is critical to differentiate how uORFs affect translation and how, in many cases, their inhibitory effects are neutralized. In pursuit of this aim, I developed a computational model of the initiation process that uses five parameters to account for uORF presence. In vivo translation efficiency data from uORF-containing reporter constructs were used to estimate the model\u27s parameters in wild type Arabidopsis. In addition, the model was applied to identify salient defects associated with a mutation in the subunit h of eukaryotic initiation factor 3 (eIF3h). The model indicates that eIF3h, by supporting re-initation during uORF elongation, facilitates uORF tolerance

    Understanding the Metabolic and Genetic Regulation of Breast Cancer Recurrence Using Magnetic Resonance-Based Integrative Metabolomics

    Get PDF
    Breast cancer is the most commonly diagnosed malignancy in women and is the leading cause of cancer-related death in the female population worldwide. In these women, breast cancer recurrence--local, regional, or distant--represents the principal cause of death from this disease. The mechanisms underlying tumor recurrence remain largely unknown. To dissect those mechanisms, our laboratory has developed inducible transgenic mouse models that accurately recapitulate key features of the natural history of human breast cancer progression: primary tumor development, tumor dormancy and recurrence. Dysregulated metabolism has long been known to be a key feature in tumorigenesis. Yet, very little is known about the connection, if any, between cellular metabolic changes and breast cancer recurrence. In this work, I design and implement a systems engineering-based approach, magnetic resonance-based integrative metabolomics, to better understand the metabolic and genetic regulation of breast cancer recurrence. Through a combination of 1H and 13C magnetic resonance spectroscopy (MRS), mass spectrometry (MS) as well as gene expression profiling and functional metabolic and genetic studies, I aim to identify the metabolic profile of mammary tumors during breast cancer progression, identify the molecular basis and role of differential glutamine uptake and metabolism in breast cancer recurrence and finally, investigate the molecular basis and role of differential lactate production in breast cancer recurrence. The findings suggest an evolving metabolic phenotype of tumors during breast cancer progression as well as metabolic dysregulation in some of the key regulatory nodes that control that evolution. Identifying the metabolic changes associated with tumor recurrence can pave the way for identifying novel diagnostic strategies and therapeutic targets that can contribute to improved clinical management and outcome for breast cancer patients

    Gene regulation and epigenotype in Friedreich's ataxia

    Get PDF
    Friedreich??????s ataxia (FRDA) is known to be provoked by an abnormal GAA-repeat expansion located in the first intron of the FXN gene. As a result of the GAA expansion, patients exhibit low levels of FXN mRNA, leading to FRDA. Here, via chromatin immunoprecipitation (ChIP), the presence of a RNA pol II transcriptional pausing site at exon 1 of the FXN gene was demonstrated. At this site, FRDA EBVcell lines exhibited elevated levels of the negative elongation factor NELF-E depending on the presence of a GAA repeat expansion compared to controls. This site may represent a rate-limiting step for FXN transcription and consequently provide a means to modify transcription levels in FRDA. Moreover, RNA pol II pausing site binding factors, such as NELF-E, were influenced by Nicotinamide treatment, a HDAC class III inhibitor. Therefore, factors sensitive to chromatin changes may influence the regulation of RNA pol II pausing and also balance otherwise positive chromatin changes. This new finding could explain the relatively minor effects of different drug approaches to up-regulate this gene. Furthermore, CTCF and the histone demethylase LSD1 were also found to be located at the FXN pausing site. Results suggest a function for LSD1 in demethylating H3K4me2 at the pausing site and potentially also in demethylating H3K9me3 in the case of frequently transcribed expanded GAA repeats. Therefore, LSD1 might play a crucial role in preventing heterochromatinisation of a euchromatic gene. Using primary transcript RNA-FISH, a delay in RNA pol II release from the pausing site and furthermore a dramatic loss of RNA pol II elongation in the presence of expanded GAA repeats was seen. The identified and characterised transcriptional pausing site at FXN is likely to play a repressive role and participates in the pathogenesis of FRDA.Imperial Users onl

    Multicofactor proteins: structure, prediction, function

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops

    Get PDF
    Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes
    corecore