    jMOSAiCS: joint analysis of multiple ChIP-seq datasets

    The ChIP-seq technique enables genome-wide mapping of in vivo protein-DNA interactions and chromatin states. Current analytical approaches for ChIP-seq analysis are largely geared towards single-sample investigations, and have limited applicability in comparative settings that aim to identify combinatorial patterns of enrichment across multiple datasets. We describe a novel probabilistic method, jMOSAiCS, for jointly analyzing multiple ChIP-seq datasets. We demonstrate its usefulness with a wide range of data-driven computational experiments and with a case study of histone modifications on GATA1-occupied segments during erythroid differentiation. jMOSAiCS is open source software and can be downloaded from Bioconductor [1]

    A Statistical Framework for the Analysis of ChIP-Seq Data

    Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data

    Regulatory architecture of the RCA gene cluster captures an intragenic TAD boundary, CTCF-Mediated chromatin looping and a long-range intergenic enhancer

    The Regulators of Complement Activation (RCA) gene cluster comprises several tandemly arranged genes with shared functions within the immune system. RCA members, such as complement receptor 2 (CR2), are well-established susceptibility genes in complex autoimmune diseases. Altered expression of RCA genes has been demonstrated at both the functional and genetic level, but the mechanisms underlying their regulation are not fully characterised. We aimed to investigate the structural organisation of the RCA gene cluster to identify key regulatory elements that influence the expression of CR2 and other genes in this immunomodulatory region. Using 4C, we captured extensive CTCF-mediated chromatin looping across the RCA gene cluster in B cells and showed these were organised into two topologically associated domains (TADs). Interestingly, an inter-TAD boundary was located within the CR1 gene at a well-characterised segmental duplication. Additionally, we mapped numerous gene-gene and gene-enhancer interactions across the region, revealing extensive co-regulation. Importantly, we identified an intergenic enhancer and functionally demonstrated this element upregulates two RCA members (CR2 and CD55) in B cells. We have uncovered novel, long-range mechanisms whereby autoimmune disease susceptibility may be influenced by genetic variants, thus highlighting the important contribution of chromatin topology to gene regulation and complex genetic disease.This work was supported by the National Institutes of Health [R01 AI24717 to JH], the Australian Government Research Training Program Scholarship at the University of Western Australia [to JC and JSC], the Spanish Government [BFU2016-74961-P to JG-S] and an institutional grant Unidad de Excelencia María de Maeztu [MDM-206-0687 to the Department of Gene Regulation and Morphogenesis, Centro Andaluz de Biología del Desarrol]

    INFIMA leverages multi-omics model organism data to identify effector genes of human GWAS variants.

    Genome-wide association studies reveal many non-coding variants associated with complex traits. However, model organism studies largely remain as an untapped resource for unveiling the effector genes of non-coding variants. We develop INFIMA, Integrative Fine-Mapping, to pinpoint causal SNPs for diversity outbred (DO) mice eQTL by integrating founder mice multi-omics data including ATAC-seq, RNA-seq, footprinting, and in silico mutation analysis. We demonstrate INFIMA\u27s superior performance compared to alternatives with human and mouse chromatin conformation capture datasets. We apply INFIMA to identify novel effector genes for GWAS variants associated with diabetes. The results of the application are available at http://www.statlab.wisc.edu/shiny/INFIMA/

    A Phylogenetic Mixture Model for the Evolution of Gene Expression

    Microarray platforms are used increasingly to make comparative inferences through genome-wide surveys of gene expression. Although recent studies focus on describing the evidence for natural selection using estimates of the within- and between-taxa mutational variances, these methods do not explicitly or flexibly account for predicted nonindependence due to phylogenetic associations between measurements. In the interest of parsing the effects of selection: we introduce a mixture model for the comparative analysis of variation in gene expression across multiple taxa. This class of models isolates the phylogenetic signal from the nonphylogenetic and the heritable signal from the nonheritable while measuring the proper amount of correction. As a result, the mixture model resolves outstanding differences between existing models, relates different ways to estimate the across taxa variance, and induces a likelihood ratio test for selection. We investigate by simulation and application the feasibility and utility of estimation of the required parameters and the power of the proposed test. We illustrate analysis under this mixture model with a gene duplication family data set

    CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data

    The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical position/nucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities

    Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data

    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments

    Diophant denklemleri

    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Anahtar kelimeler: Kareler toplamı, Pisagor üçlüleri, Tek türlü parçalanmalı bölge.Bu tezde bazı Diophant denklemleri incelenmistir. Birinci bölümde bazı asalsayıların, iki sayının karelerinin toplamı biçiminde yazılabilecegi gösterildi. Ayrıcabu bölümde her n dogal sayısının dört tamsayının kareleri toplamı biçimindeyazılabilecegi gösterildi. kinci bölümde de pisagor üçlüleri incelenmis ve bazıdenklemlerin çözümlerinin olmadıgı gösterilmistir. Üçüncü bölümde ise 3 3 3x + y = zdenkleminin çözümünün olmadıgı gösterilmistir. Son olarak dördüncü bölümde tektürlü parçalanmalı bölgeler ele alınarak bazı Diophant denklemlerinin çözümleriincelenmistir.Key Words: Sum of squares, Pytheporian triples, Unique factorization domain.In this thesis, we investigated some Diophant equations. In the first chapter, it isshown that the primes of the form 4n +1, is a sums of two squares. Moreover it isshown that every natural number is a sum of four squares. Second chapter is devotedto the Pythegorion triples. In this chapter it is shown that some Diophant equationshas not got a solution. In the thirth chapter, the equation 3 3 3x + y = z , is consideredand it shown that is no solution to this equation. Lastly, in the fourth chapter byconsidering the unique factorization domain, solutions of Diophant equations areinvestigated

    Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data

    Although genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs. However, applicability of these approaches is challenged by high dimensionality and heterogeneity of functional annotation data. Furthermore, existing methods often assume global associations of GVs with annotation data. This strong assumption is susceptible to violations for GVs involved in many complex diseases. To address these issues, we develop a general regression framework, named Annotation Regression for GWAS (ARoG). ARoG is based on a finite mixture of linear regressions model where GWAS association measures are viewed as responses and functional annotations as predictors. This mixture framework addresses heterogeneity of effects of GVs by grouping them into clusters and high dimensionality of the functional annotations by enabling annotation selection within each cluster. ARoG further employs permutation testing to evaluate the significance of selected annotations. Computational experiments indicate that ARoG can discover distinct associations between disease risk and functional annotations. Application of ARoG to autism and schizophrenia data from Psychiatric Genomics Consortium led to identification of GVs that significantly affect interactions of several transcription factors with DNA as potential mechanisms contributing to these disorders.11Nscopu

    atSNP: transcription factor binding affinity testing for regulatory SNP detection

    Motivation: Genome-wide association studies revealed that most disease-associated single nucleotide polymorphisms (SNPs) are located in regulatory regions within introns or in regions between genes. Regulatory SNPs (rSNPs) are such SNPs that affect gene regulation by changing transcription factor (TF) binding affinities to genomic sequences. Identifying potential rSNPs is crucial for understanding disease mechanisms. In silico methods that evaluate the impact of SNPs on TF binding affinities are not scalable for large-scale analysis. Results: We describe affinity testing for regulatory SNPs (atSNP), a computationally efficient R package for identifying rSNPs in silico. atSNP implements an importance sampling algorithm coupled with a first-order Markov model for the background nucleotide sequences to test the significance of affinity scores and SNP-driven changes in these scores. Application of atSNP with >20 K SNPs indicates that atSNP is the only available tool for such a large-scale task. atSNP provides user-friendly output in the form of both tables and composite logo plots for visualizing SNP-motif interactions. Evaluations of atSNP with known rSNP-TF interactions indicate that atSNP is able to prioritize motifs for a given set of SNPs with high accuracy.33