535 research outputs found

    Discovering sequences with potential regulatory characteristics

    Get PDF
    AbstractWe developed a computational model to explore the hypothesis that regulatory instructions are context dependent and conveyed through specific ‘codes’ in human genomic DNA. We provide examples of correlation of computational predictions to reported mapped DNase I hypersensitive segments in the HOXA locus in human chromosome 7. The examples show that statistically significant 9-mers from promoter regions may occur in sequences near and upstream of transcription initiation sites, in intronic regions, and within intergenic regions. Additionally, a subset of 9-mers from coding sequences appears frequently, as clusters, in regulatory regions dispersed in noncoding regions in genomic DNA. The results suggest that the computational model has the potential of decoding regulatory instructions to discover candidate transcription factor binding sites and to discover candidate epigenetic signals that appear in both coding and regulatory regions of genes

    Computational identification of transcriptional regulatory elements in DNA sequence

    Get PDF
    Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges

    Transcription factor binding specificity and occupancy : elucidation, modelling and evaluation

    Get PDF
    The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling

    Identification of Long-Range Regulatory Elements in the Human Genome

    Get PDF
    Genome-wide association studies have shown that the majority of disease-associated genetic variants lie within non-coding regions of the human genome. Subsequently, a challenge following these discoveries is to identify how these variants modulate the risk of disease. Enhancers are non-coding regulatory elements that can be bound by proteins to activate the expression of a gene that may be linearly distant. Experimentally probing all possible enhancer–target gene pairs can be laborious. Hi-C, a technique developed by Job Dekker’s group in 2009, combines high-throughput sequencing with chromosome conformation capture to detect DNA interactions genome-wide and thereby reveals the three-dimensional architecture of chromatin in the nucleus. However, the utility of the datasets produced by this technique for discovering long-range regulatory interactions is largely unexplored. In this thesis, we develop novel approaches to identify DNA-interacting units and their interactions in Hi-C datasets with the goal of uncovering all enhancer–target gene interactions. We began by identifying significantly interacting regions in these datasets, subsequently focusing on candidate enhancer–gene pairs. We found that the identified putative enhancers are enriched for p300 binding activity, while their target promoters are likely to be cell-type-specific. Furthermore, we revealed that enhancers and target genes often interact in many-to-many relationships and the majority of enhancer–target gene interactions are intra-chromosomal and within 1 Mb of each other. Next, we refined our analytical approach to identify physically-interacting DNA regions at ~1 kb resolution and better define the boundaries of likely enhancer elements. By searching for over-represented sequences (motifs) in these putative promoter-interacting enhancers, we were then able to identify bound transcription factors. This newer approach provides the potential to identify protein complexes involved in enhancer–promoter interactions, which can be verified in future experiments. We implemented a high-throughput identification pipeline for promoter-interacting enhancer elements (HIPPIE) using both of the above described approaches. HIPPIE can be run efficiently on typical Linux servers and grid computing environments and is available as open-source software. In summary, our findings demonstrate the potential utility of Hi-C technologies for elucidating the mechanisms by which long-range enhancers regulate gene expression and ultimately result in human disease phenotypes

    Genomic features defining exonic variants that modulate splicing

    Get PDF
    A comparative analysis of SNPs and their exonic and intronic environments identifies the features predictive of splice affecting variants

    Deciphering the transcriptional regulation and response of barley to obligate fungal biotroph invasion

    Get PDF
    Obligate fungal biotrophs have co-evolved with their plant hosts, a direct result of an intimate interaction that protects the integrity of the plant during pathogenesis, allowing it to obtain essential nutrients. To restrict the establishment of pathogen colonization, plants have evolved complex regulatory mechanisms to control the defense response, the most extreme of which involves Resistance (R) gene-mediated programmed cell death. While it is known that de novo gene expression and subsequent protein synthesis are required for several cell death programs, the primary transcriptional targets of R gene-mediated responses are unknown. Two alternative approaches were used to identify these transcriptional targets. The first approach uses a time-course microarray experiment that contrasts wild-type and loss-of-function mutant alleles of the Mla (powdery mildew) R gene to identify transcripts that distinguish incompatibility from compatibility. Earlier expression and stronger transcriptional responses were observed in compatible plants at 20 hours after inoculation, though this reaction diminished at later time points. In contrast, incompatible interactions exhibited a time-dependent strengthening of the transcriptional response, with increases in both fold change and total number of genes differentially expressed. These results implicate MLA as a repressor of early gene expression response and provides further evidence for a link between basal and R gene-mediated resistance. The second approach uses natural variation present in a doubled-haploid population to identify the regulatory hierarchy of gene expression during the interaction of barley and stem rust. A trans-eQTL hotspot is not associated with the R gene Rpg-TTKSK, but instead an inoculation-dependent expression polymorphism in Adf3 implicates it as a candidate susceptibility gene. In contrast, co-localization of a trans-eQTL hotspot with an enhancer of R gene-mediated resistance to stem rust associates the suppression of gene expression with enhanced resistance. Lastly, Blufensin1 (Bln1) is used as a case study for functional analysis using gene expression, structural features, and phenotype. Although greater expression of Bln1 was previously associated with incompatibility, virus-induced gene silencing and transient overexpression implicates that Bln1 negatively impacts defense. Collectively, these studies suggest that our understanding of gene expression and its phenotypic consequences is more complex than previously thought

    Statistical Methods in Integrative Genomics

    Get PDF
    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions

    When needles look like hay: How to find tissue-specific enhancers in model organism genomes

    Get PDF
    AbstractA major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found.Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project

    Studying the regulatory landscape of flowering plants

    Get PDF
    • …
    corecore