84 research outputs found

    Wavelet Screening identifies regions highly enriched for differentially methylated loci for orofacial clefts

    Get PDF
    DNA methylation is the most widely studied epigenetic mark in humans and plays an essential role in normal biological processes as well as in disease development. More focus has recently been placed on understanding functional aspects of methylation, prompting the development of methods to investigate the relationship between heterogeneity in methylation patterns and disease risk. However, most of these methods are limited in that they use simplified models that may rely on arbitrarily chosen parameters, they can only detect differentially methylated regions (DMRs) one at a time, or they are computationally intensive. To address these shortcomings, we present a wavelet-based method called ‘Wavelet Screening’ (WS) that can perform an epigenome-wide association study (EWAS) of thousands of individuals on a single CPU in only a matter of hours. By detecting multiple DMRs located near each other, WS identifies more complex patterns that can differentiate between different methylation profiles. We performed an extensive set of simulations to demonstrate the robustness and high power of WS, before applying it to a previously published EWAS dataset of orofacial clefts (OFCs). WS identified 82 associated regions containing several known genes and loci for OFCs, while other findings are novel and warrant replication in other OFCs cohorts.publishedVersio

    A fast wavelet-based functional association analysis replicates several susceptibility loci for birth weight in a Norwegian population

    Get PDF
    Background Birth weight (BW) is one of the most widely studied anthropometric traits in humans because of its role in various adult-onset diseases. The number of loci associated with BW has increased dramatically since the advent of whole-genome screening approaches such as genome-wide association studies (GWASes) and meta-analyses of GWASes (GWAMAs). To further contribute to elucidating the genetic architecture of BW, we analyzed a genotyped Norwegian dataset with information on child’s BW (N=9,063) using a slightly modified version of a wavelet-based method by Shim and Stephens (2015) called WaveQTL. Results WaveQTL uses wavelet regression for regional testing and offers a more flexible functional modeling framework compared to conventional GWAS methods. To further improve WaveQTL, we added a novel feature termed “zooming strategy” to enhance the detection of associations in typically small regions. The modified WaveQTL replicated five out of the 133 loci previously identified by the largest GWAMA of BW to date by Warrington et al. (2019), even though our sample size was 26 times smaller than that study and 18 times smaller than the second largest GWAMA of BW by Horikoshi et al. (2016). In addition, the modified WaveQTL performed better in regions of high LD between SNPs. Conclusions This study is the first adaptation of the original WaveQTL method to the analysis of genome-wide genotypic data. Our results highlight the utility of the modified WaveQTL as a complementary tool for identifying loci that might escape detection by conventional genome-wide screening methods due to power issues. An attractive application of the modified WaveQTL would be to select traits from various public GWAS repositories to investigate whether they might benefit from a second analysis.publishedVersio

    Organization and evolution of information within eukaryotic genomes.

    Get PDF

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

    Get PDF
    Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

    Network-guided data integration and gene prioritization

    Get PDF

    Genome-Wide Analysis of Histone Modification Enrichments Induced by Marek's Disease Virus in Inbred Chicken Lines

    Get PDF
    Covalent histone modifications constitute a complex network of transcriptional regulation involved in diverse biological processes ranging from stem cell differentiation to immune response. The advent of modern sequencing technologies enables one to query the locations of histone modifications across the genome in an efficient manner. However, inherent biases in the technology and diverse enrichment patterns complicate data analysis. Marek's disease (MD) is an acute, lymphoma-inducing disease of chickens with disease outcomes affected by multiple host and environmental factors. Inbred chicken lines 63 and 72 share the same major histocompatibility complex haplotype, but have contrasting responses to MD. This dissertation presents novel methods for analysis of genome-wide histone modification data and application of new and existing methods to the investigation of epigenetic effects of MD on these lines. First, we present WaveSeq, a novel algorithm for detection of significant enrichments in ChIP-Seq data. WaveSeq implements a distribution-free approach by combining the continuous wavelet transform with Monte Carlo sampling techniques for effective peak detection. WaveSeq outperformed existing tools particularly for diffuse histone modification peaks demonstrating that restrictive distributional assumptions are not necessary for accurate ChIP-Seq peak detection. Second, we investigated latent MD in thymus tissues by profiling H3K4me3 and H3K27me3 in infected and control birds from lines 63 and 72. Several genes associated with MD, e.g. MX1 and CTLA–4, along with those linked with human cancers, showed line-specific and condition-specific enrichments. One of the first studies of histone modifications in chickens, our work demonstrated that MD induced widespread epigenetic variations. Finally, we analyzed the temporal evolution of histone modifications at distinct phases of MD progression in the bursa of Fabricius. Genes involved in several important pathways, e.g. apoptosis and MAPK signaling, and various immune-related miRNAs showed differential histone modifications in the promoter region. Our results indicated heightened inflammation in the susceptible line during early cytolytic MD, while resistant birds showed recuperative symptoms during early MD and epigenetic silencing during latent infection. Thus, although further elucidation of underlying mechanisms is necessary, this work provided the first definitive evidence of the epigenetic effects of MD

    Mathematical Modelling of Spatially Coherent Transcription

    No full text
    Genetics and epigenetics are widely expected to revolutionise our understanding of health and disease. However any attempt to extract relevant information from noisy data requires a combination of modelling and statistical techniques. Given the number of genes and the complexity involved in the genome, sophisticated methods will be needed to properly capture the information that is contained. Many mechanisms and variables can affect and control the expression of a gene. In this thesis, it is specifically spatially coherent variations in transcription which are investigated. Several different areas were examined, producing a broad set of results. Important findings include the demonstration of spatial coherence as the result of epigenetic effects, the creation and validation of a technique to detect spatial coherence, and the extension of spatial modelling to epigenetic data. Other important results include the detection of spatial coherence variation due to confounding variables (PMI and neuronal concentration) and the development of new spatial modelling techniques. The results indicate that spatial modelling provides a useful approach to investigating unusual and unknown aspects of epigenetic and transcriptional regulation

    Role of CBP and p300 in the establishment and maintenance of transcriptional programs in adult excitatory neurons

    Get PDF
    The paralog lysine acetyltransferases (KAT) CREB binding protein (CBP) and E1A binding protein (p300) are both essential for the normal development of the nervous system, but their specific function in post-mitotic neurons remain unclear. To investigate these functions, we produced inducible forebrain-specific knockout mice for either one or both proteins. When both KATs were knocked out simultaneously in the adult brain, but not after individual ablation, mice showed a rapid deterioration, severe neurological phenotypes and premature death. These phenotypes were associated with the reduction of once-acquired dendritic complexity and electrical activity in excitatory neurons, which correlates with the transcriptional shutdown of neuronal genes and a dramatic loss of H3K27 acetylation and occupancy by pro-neural transcription factors at neuronal enhancers. Targeted lysine acetylation using the CRISPR/dCas9 system restituted neuronal-specific gene expression. These experiments demonstrate that KAT3 proteins are necessary for maintaining neuronal identity and function in the adult brain by preserving correct chromatin acetylation levels. Further insight into the phenotype of a single-KAT3 induced forebrain knockouts showed that a homozygous loss of a CBP caused a highly specific phenotype in cognition, transcription and histone acetylation. Meanwhile, the modest changes in histone acetylation caused homozygous loss of p300 did not correlate with any changes in behavior or gene expression. Interestingly, the difference between CBP and p300 was highlighted when mice were exposed to a neuroadaptive paradigm like environmental enrichment or pro-epileptic drug sensitization. Whereas the p300 knockouts again did not show any difference from the control littermates, the CBP knockout mice were unable to adapt to the environmental change. This effect was paralleled by a failure in induction a specific gene expression programs induced in control mice as a result of the challenge. Therefore, CBP and p300 jointly maintain neuro-specific transcriptional programs in adult excitatory neurons, and CBP seems to be vital for shifting these programs in response to experiences or environmental changes

    Modern Computing Techniques for Solving Genomic Problems

    Get PDF
    With the advent of high-throughput genomics, biological big data brings challenges to scientists in handling, analyzing, processing and mining this massive data. In this new interdisciplinary field, diverse theories, methods, tools and knowledge are utilized to solve a wide variety of problems. As an exploration, this dissertation project is designed to combine concepts and principles in multiple areas, including signal processing, information-coding theory, artificial intelligence and cloud computing, in order to solve the following problems in computational biology: (1) comparative gene structure detection, (2) DNA sequence annotation, (3) investigation of CpG islands (CGIs) for epigenetic studies. Briefly, in problem #1, sequences are transformed into signal series or binary codes. Similar to the speech/voice recognition, similarity is calculated between two signal series and subsequently signals are stitched/matched into a temporal sequence. In the nature of binary operation, all calculations/steps can be performed in an efficient and accurate way. Improving performance in terms of accuracy and specificity is the key for a comparative method. In problem #2, DNA sequences are encoded and transformed into numeric representations for deep learning methods. Encoding schemes greatly influence the performance of deep learning algorithms. Finding the best encoding scheme for a particular application of deep learning is significant. Three applications (detection of protein-coding splicing sites, detection of lincRNA splicing sites and improvement of comparative gene structure identification) are used to show the computing power of deep neural networks. In problem #3, CpG sites are assigned certain energy and a Gaussian filter is applied to detection of CpG islands. By using the CpG box and Markov model, we investigate the properties of CGIs and redefine the CGIs using the emerging epigenetic data. In summary, these three problems and their solutions are not isolated; they are linked to modern techniques in such diverse areas as signal processing, information-coding theory, artificial intelligence and cloud computing. These novel methods are expected to improve the efficiency and accuracy of computational tools and bridge the gap between biology and scientific computing
    • 

    corecore