Search CORE

17 research outputs found

Identification of Long-Range Regulatory Elements in the Human Genome

Author: Hwang Yih-Chii
Publication venue: ScholarlyCommons
Publication date: 01/01/2015
Field of study

Genome-wide association studies have shown that the majority of disease-associated genetic variants lie within non-coding regions of the human genome. Subsequently, a challenge following these discoveries is to identify how these variants modulate the risk of disease. Enhancers are non-coding regulatory elements that can be bound by proteins to activate the expression of a gene that may be linearly distant. Experimentally probing all possible enhancer–target gene pairs can be laborious. Hi-C, a technique developed by Job Dekker’s group in 2009, combines high-throughput sequencing with chromosome conformation capture to detect DNA interactions genome-wide and thereby reveals the three-dimensional architecture of chromatin in the nucleus. However, the utility of the datasets produced by this technique for discovering long-range regulatory interactions is largely unexplored. In this thesis, we develop novel approaches to identify DNA-interacting units and their interactions in Hi-C datasets with the goal of uncovering all enhancer–target gene interactions. We began by identifying significantly interacting regions in these datasets, subsequently focusing on candidate enhancer–gene pairs. We found that the identified putative enhancers are enriched for p300 binding activity, while their target promoters are likely to be cell-type-specific. Furthermore, we revealed that enhancers and target genes often interact in many-to-many relationships and the majority of enhancer–target gene interactions are intra-chromosomal and within 1 Mb of each other. Next, we refined our analytical approach to identify physically-interacting DNA regions at ~1 kb resolution and better define the boundaries of likely enhancer elements. By searching for over-represented sequences (motifs) in these putative promoter-interacting enhancers, we were then able to identify bound transcription factors. This newer approach provides the potential to identify protein complexes involved in enhancer–promoter interactions, which can be verified in future experiments. We implemented a high-throughput identification pipeline for promoter-interacting enhancer elements (HIPPIE) using both of the above described approaches. HIPPIE can be run efficiently on typical Linux servers and grid computing environments and is available as open-source software. In summary, our findings demonstrate the potential utility of Hi-C technologies for elucidating the mechanisms by which long-range enhancers regulate gene expression and ultimately result in human disease phenotypes

ScholarlyCommons@Penn

High-Throughput Identification of Long-Range Regulatory Elements and Their Target Promoters in the Human Genome

Author: Gregory Brian D
Hwang Yih-Chii
Wang Li-San
Zheng Qi
Publication venue: ScholarlyCommons
Publication date: 21/03/2013
Field of study

Enhancer elements are essential for tissue-specific gene regulation during mammalian development. Although these regulatory elements are often distant from their target genes, they affect gene expression by recruiting transcription factors to specific promoter regions. Because of this long-range action, the annotation of enhancer element–target promoter pairs remains elusive. Here, we developed a novel analysis methodology that takes advantage of Hi-C data to comprehensively identify these interactions throughout the human genome. To do this, we used a geometric distribution-based model to identify DNA–DNA interaction hotspots that contact gene promoters with high confidence. We observed that these promoter-interacting hotspots significantly overlap with known enhancer-associated histone modifications and DNase I hypersensitive sites. Thus, we defined thousands of candidate enhancer elements by incorporating these features, and found that they have a significant propensity to be bound by p300, an enhancer binding transcription factor. Furthermore, we revealed that their target genes are significantly bound by RNA Polymerase II and demonstrate tissue-specific expression. Finally, we uncovered that these elements are generally found within 1 Mb of their targets, and often regulate multiple genes. In total, our study presents a novel high-throughput workflow for confident, genome-wide discovery of enhancer–target promoter pairs, which will significantly improve our understanding of these regulatory interactions

PubMed Central

ScholarlyCommons@Penn

Predicting essential genes based on network and sequence analysis

Author: Hwang Yih-Chii
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 07/09/2010
Field of study

National Taiwan University Repository

McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes

Author: Aslihan Karabacak
Dina Hafez
Li-San Wang
Robert P. Zinzen
Sabrina Krueger
Uwe Ohler
Yih-Chii Hwang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2017
Field of study

Abstract Transcriptional enhancers regulate spatio-temporal gene expression. While genomic assays can identify putative enhancers en masse, assigning target genes is a complex challenge. We devised a machine learning approach, McEnhancer, which links target genes to putative enhancers via a semi-supervised learning algorithm that predicts gene expression patterns based on enriched sequence features. Predicted expression patterns were 73–98% accurate, predicted assignments showed strong Hi-C interaction enrichment, enhancer-associated histone modifications were evident, and known functional motifs were recovered. Our model provides a general framework to link globally identified enhancers to targets and contributes to deciphering the regulatory genome

Directory of Open Access Journals

MDC Repository

Contingency table for probeset gij at age a = δj where j≤m samples (m = 1, 2, …n-1).

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

Contingency table for probeset gij at age a = δj where j≤m samples (m = 1, 2, …n-1).</p

FigShare

Functional annotation analysis summary.

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

Functional annotation analysis summary.</p

FigShare

Histograms of genes with age-regulated transition points within each decade between 25 and 95 years (p-value≤0.005) in three brain regions.

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

The distribution of age-regulated genes is very different in BA10 compared to BA9 and BA47.</p

FigShare

Performance of age estimation using the proposed naïve Bayes method and other methods.

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

The five-fold cross validation results showed age estimation errors of the naïve Bayes model to be 0.14∼4.38 years smaller than in other tested models, thus reducing error by 34%.</p

FigShare

Characteristics Path Length in-between Groups of age-correlated genes.

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

The size of each region shows the number of genes having overlap between the protein-protein interaction network and our age-regulated gene study. Size of Y(young) vs. M (middle-aged), M vs. O (old), and O vs. Y showed the union gene number of each group each brain region.</p

FigShare

Best number of genes (N) used in age estimation and the difference of median of age is the absolute difference between the median of estimated age and the median of chronological age.

Author: F. Brad Johnson (13677)
Kajia Cao (240822)
Li-San Wang (10922)
Paul Ryvkin (241021)
Yih-Chii Hwang (466439)
Publication venue
Publication date
Field of study

The significance of the error was determined by obtaining 1,000 randomized cross-validation errors with age information randomly shuffled; the significance of the prediction error is the fraction of the 1,000 randomized errors lower than the actual cross-validation error. The best N varies across regions.</p

FigShare

Identification of Long-Range Regulatory Elements in the Human Genome

High-Throughput Identification of Long-Range Regulatory Elements and Their Target Promoters in the Human Genome

Predicting essential genes based on network and sequence analysis

McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes

Contingency table for probeset <i>g<sub>ij</sub></i> at age <i>a = δ<sub>j</sub></i> where <i>j≤m</i> samples (<i>m = 1, 2, …n-1</i>).

Functional annotation analysis summary.

Histograms of genes with age-regulated transition points within each decade between 25 and 95 years (p-value≤0.005) in three brain regions.

Performance of age estimation using the proposed naïve Bayes method and other methods.

Characteristics Path Length in-between Groups of age-correlated genes.

Best number of genes (<i>N</i>) used in age estimation and the difference of median of age is the absolute difference between the median of estimated age and the median of chronological age.