Search CORE

Determining Physical Constraints in Transcriptional Initiation Complexes Using DNA Sequence Analysis

Author: Chiang Derek Y.
Eisen Michael B.
Moses Alan M.
Shultzaberger Ryan K.
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Eukaryotic gene expression is often under the control of cooperatively acting transcription factors whose binding is limited by structural constraints. By determining these structural constraints, we can understand the “rules” that define functional cooperativity. Conversely, by understanding the rules of binding, we can infer structural characteristics. We have developed an information theory based method for approximating the physical limitations of cooperative interactions by comparing sequence analysis to microarray expression data. When applied to the coordinated binding of the sulfur amino acid regulatory protein Met4 by Cbf1 and Met31, we were able to create a combinatorial model that can correctly identify Met4 regulated genes. Interestingly, we found that the major determinant of Met4 regulation was the sum of the strength of the Cbf1 and Met31 binding sites and that the energetic costs associated with spacing appeared to be minimal

CiteSeerX

Springer - Publisher Connector

UNT Digital Library

Position specific variation in the rate of evolution in transcription factor binding sites

Author: Chiang Derek Y
Eisen Michael B
Kellis Manolis
Lander Eric S
Moses Alan M
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. RESULTS: Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. CONCLUSION: As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA

DSpace@MIT

Springer - Publisher Connector

UNT Digital Library

MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model

Author: Chiang Derek Y
Eisen Michael B
Iyer Venky N
Moses Alan M
Pollard Daniel A
Publication venue: BioMed Central
Publication date: 28/10/2004
Field of study

We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding-site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function

Harvard University - DASH

UNT Digital Library

Recommended from our members

Flexible Promoter Architecture Requirements for Coactivator Recruitment

Author: Chiang Derek Y-h
Eisen Michael B
Gasch Audrey P
Nix David A
Shultzaberger Ryan K
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2011
Field of study

Background: The spatial organization of transcription factor binding sites in regulatory DNA, and the composition of intersite sequences, influences the assembly of the multiprotein complexes that regulate RNA polymerase recruitment and thereby affects transcription. We have developed a genetic approach to investigate how reporter gene transcription is affected by varying the spacing between transcription factor binding sites. We characterized the components of promoter architecture that govern the yeast transcription factors Cbf1 and Met31/32, which bind independently, but collaboratively recruit the coactivator Met4. Results: A Cbf1 binding site was required upstream of a Met31/32 binding site for full reporter gene expression. Distance constraints on coactivator recruitment were more flexible than those for cooperatively binding transcription factors. Distances from 18 to 50 bp between binding sites support efficient recruitment of Met4, with only slight modulation by helical phasing. Intriguingly, we found that certain sequences located between the binding sites abolished gene expression. Conclusion: These results yield insight to the influence of both binding site architecture and local DNA flexibility on gene expression, and can be used to refine computational predictions of gene expression from promoter sequences. In addition, our approach can be applied to survey promoter architecture requirements for arbitrary combinations of transcription factor binding sites

Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi

Author: Berardini Mark
Chiang Derek Y
Eisen Michael B
Fraser Hunter B
Gasch Audrey P
Moses Alan M
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

Relatively little is known about the mechanisms through which gene expression regulation evolves. To investigate this, we systematically explored the conservation of regulatory networks in fungi by examining the cis-regulatory elements that govern the expression of coregulated genes. We first identified groups of coregulated Saccharomyces cerevisiae genes enriched for genes with known upstream or downstream cis-regulatory sequences. Reasoning that many of these gene groups are coregulated in related species as well, we performed similar analyses on orthologs of coregulated S. cerevisiae genes in 13 other ascomycete species. We find that many species-specific gene groups are enriched for the same flanking regulatory sequences as those found in the orthologous gene groups from S. cerevisiae, indicating that those regulatory systems have been conserved in multiple ascomycete species. In addition to these clear cases of regulatory conservation, we find examples of cis-element evolution that suggest multiple modes of regulatory diversification, including alterations in transcription factor-binding specificity, incorporation of new gene targets into an existing regulatory system, and cooption of regulatory systems to control a different set of genes. We investigated one example in greater detail by measuring the in vitro activity of the S. cerevisiae transcription factor Rpn4p and its orthologs from Candida albicans and Neurospora crassa. Our results suggest that the DNA binding specificity of these proteins has coevolved with the sequences found upstream of the Rpn4p target genes and suggest that Rpn4p has a different function in N. crassa

CiteSeerX

Public Library of Science (PLOS)

Crossref

Elsevier - Publisher Connector

FigShare

Cancer gene discovery in hepatocellular carcinoma

Author: Chiang Derek Y.
Llovet Josep M.
Sia Daniela
Tovar Victoria
Villanueva Augusto
Zender Lars
Publication venue
Publication date: 01/01/2010
Field of study

Hepatocellular carcinoma (HCC) is a deadly cancer, whose incidence is increasing worldwide. Albeit the main risk factors for HCC development have been clearly identified, such as hepatitis B and C virus infection and alcohol abuse, there is still preliminary understanding of the key drivers of this malignancy. Recent data suggest that genomic analysis of cirrhotic tissue - the pre-neoplastic carcinogenic field - may provide a read-out to identify at risk populations for cancer development. Given this contextual complexity, it is of utmost importance to characterize the molecular pathogenesis of this disease, and pinpoint the dominant pathways/drivers by integrative oncogenomic approaches and/or sophisticated experimental models. Identification of the dominant proliferative signals and key aberrations will allow for a more personalized therapy

Springer - Publisher Connector

Lack of benefits for prevention of cardiovascular disease with aspirin therapy in type 2 diabetic patients - a longitudinal observational study

Author: Chan Francis K
Chan Juliana C
Chiang Sau-chu
Ko Gary T
Kong Alice P
Leung Wilson Y
Lui Augustine
Ma Ronald C
So Wing-yee
Stewart Derek
Tong Peter C
Yang Xilin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The risk-benefit ratio of aspirin therapy in prevention of cardiovascular disease (CVD) remains contentious, especially in type 2 diabetes. This study examined the benefit and harm of low-dose aspirin (daily dose < 300 mg) in patients with type 2 diabetes. Methods This is a longitudinal observational study with primary and secondary prevention cohorts based on history of CVD at enrolment. We compared the occurrence of primary composite (non-fatal myocardial infarction or stroke and vascular death) and secondary endpoints (upper GI bleeding and haemorrhagic stroke) between aspirin users and non-users between January 1995 and July 2005. Results Of the 6,454 patients (mean follow-up: median [IQR]: 4.7 [4.4] years), usage of aspirin was 18% (n = 1,034) in the primary prevention cohort (n = 5731) and 81% (n = 585) in the secondary prevention cohort (n = 723). After adjustment for covariates, in the primary prevention cohort, aspirin use was associated with a hazard-ratio of 2.07 (95% CI: 1.66, 2.59, p < 0.001) for primary endpoint. There was no difference in CVD event rate in the secondary prevention cohort. Overall, aspirin use was associated with a hazard-ratio of 2.2 (1.53, 3.15, p < 0.001) of GI bleeding and 1.71 (1.00, 2.95, p = 0.051) of haemorrhagic stroke. The absolute risk of aspirin-related GI bleeding was 10.7 events per 1,000 person-years of treatment. Conclusion In Chinese type 2 diabetic patients, low dose aspirin was associated with a paradoxical increase in CVD risk in primary prevention and did not confer benefits in secondary prevention. In addition, the risk of GI bleeding in aspirin users was rather high.</p

Crossref

A Robust Method for Transcript Quantification with RNA-Seq Data

Author: Chiang Derek Y.
Hu Yin
Huang Yan
Jones Corbin D.
Liu Jinze
Liu Yufeng
MacLeod James N.
Prins Jan F.
Publication venue
Publication date: 01/01/2013
Field of study

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes

MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery

Author: Chiang Derek Y.
Coleman Stephen J
Grimm Sara A.
He Xiaping
Huang Yan
Liu Jinze
Macleod James N
Mieczkowski Piotr
Perou Charles M.
Prins Jan F.
Savich Gleb L.
Singh Darshan
Wang Kai
Zeng Zheng
Publication venue: UKnowledge
Publication date: 01/01/2010
Field of study

The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (\u3c75 bp) and long reads (≥75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice

University of Kentucky