222 research outputs found

    Conserved and non-conserved enhancers direct tissue specific transcription in ancient germ layer specific developmental control genes

    Full text link
    Abstract Background Identifying DNA sequences (enhancers) that direct the precise spatial and temporal expression of developmental control genes remains a significant challenge in the annotation of vertebrate genomes. Locating these sequences, which in many cases lie at a great distance from the transcription start site, has been a major obstacle in deciphering gene regulation. Coupling of comparative genomics with functional validation to locate such regulatory elements has been a successful method in locating many such regulatory elements. But most of these studies looked either at a single gene only or the whole genome without focusing on any particular process. The pressing need is to integrate the tools of comparative genomics with knowledge of developmental biology to validate enhancers for developmental transcription factors in greater detail Results Our results show that near four different genes (nkx3.2, pax9, otx1b and foxa2) in zebrafish, only 20-30% of highly conserved DNA sequences can act as developmental enhancers irrespective of the tissue the gene expresses in. We find that some genes also have multiple conserved enhancers expressing in the same tissue at the same or different time points in development. We also located non-conserved enhancers for two of the genes (pax9 and otx1b). Our modified Bacterial artificial chromosome (BACs) studies for these 4 genes revealed that many of these enhancers work in a synergistic fashion, which cannot be captured by individual DNA constructs and are not conserved at the sequence level. Our detailed biochemical and transgenic analysis revealed Foxa1 binds to the otx1b non-conserved enhancer to direct its activity in forebrain and otic vesicle of zebrafish at 24 hpf. Conclusion Our results clearly indicate that high level of functional conservation of genes is not necessarily associated with sequence conservation of its regulatory elements. Moreover certain non conserved DNA elements might have role in gene regulation. The need is to bring together multiple approaches to bear upon individual genes to decipher all its regulatory elements.</p

    Success in the DREAM3 Signaling Response Challenge Using Simple Weighted-Average Imputation: Lessons for Community-Wide Experiments in Systems Biology

    Get PDF
    Our group produced the best predictions overall in the DREAM3 signaling response challenge, being tops by a substantial margin in the cytokine sub-challenge and nearly tied for best in the phosphoprotein sub-challenge. We achieved this success using a simple interpolation strategy. For each combination of a stimulus and inhibitor for which predictions were required, we had noted there were six other datasets using the same stimulus (but different inhibitor treatments) and six other datasets using the same inhibitor (but different stimuli). Therefore, for each treatment combination for which values were to be predicted, we calculated rank correlations for the data that were in common between the treatment combination and each of the 12 related combinations. The data from the 12 related combinations were then used to calculate missing values, weighting the contributions from each experiment based on the rank correlation coefficients. The success of this simple method suggests that the missing data were largely over-determined by similarities in the treatments. We offer some thoughts on the current state and future development of DREAM that are based on our success in this challenge, our success in the earlier DREAM2 transcription factor target challenge, and our experience as the data provider for the gene expression challenge in DREAM3

    Constrained Dynamic Programming and Supervised Penalty Learning Algorithms for Peak Detection in Genomic Data

    Get PDF
    Peak detection in genomic data involves segmenting counts of DNA sequence reads aligned to different locations of a chromosome. The goal is to detect peaks with higher counts, and filter out background noise with lower counts. Most existing algorithms for this problem are unsupervised heuristics tailored to patterns in specific data types. We propose a supervised framework for this problem, using optimal changepoint detection models with learned penalty functions. We propose the first dynamic programming algorithm that is guaranteed to compute the optimal solution to changepoint detection problems with constraints between adjacent segment mean parameters. Implementing this algorithm requires the choice of penalty parameter that determines the number of segments that are estimated. We show how the supervised learning ideas of Rigaill et al. (2013) can be used to choose this penalty. We compare the resulting implementation of our algorithm to several baselines in a benchmark of labeled ChIP-seq data sets with two dierent patterns (broad H3K36me3 data and sharp H3K4me3 data). Whereas baseline unsupervised methods only provide accurate peak detection for a single pattern, our supervised method achieves state-of-the-art accuracy in all data sets. The log-linear timings of our proposed dynamic programming algorithm make it scalable to the large genomic data sets that are now common. Our implementation is available in the PeakSegOptimal R package on CRAN

    Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

    Get PDF
    Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes

    Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

    Get PDF
    We describe a new algorithm and R package for peak detection in genomic data sets using constrained changepoint algorithms. These detect changes from background to peak regions by imposing the constraint that the mean should alternately increase then decrease. An existing algorithm for this problem exists, and gives state-of-the-art accuracy results, but it is computationally expensive when the number of changes is large. We propose the GFPOP algorithm that jointly estimates the number of peaks and their locations by minimizing a cost function which consists of a data fitting term and a penalty for each changepoint. Empirically this algorithm has a cost that is O(Nlog(N))O(N \log(N)) for analysing data of length NN. We also propose a sequential search algorithm that finds the best solution with KK segments in O(log(K)Nlog(N))O(\log(K)N \log(N)) time, which is much faster than the previous O(KNlog(N))O(KN \log(N)) algorithm. We show that our disk-based implementation in the PeakSegDisk R package can be used to quickly compute constrained optimal models with many changepoints, which are needed to analyze typical genomic data sets that have tens of millions of observations
    corecore