13 research outputs found

    LuxRep: a technical replicate-aware method for bisulfite sequencing data analysis

    Get PDF
    Background: DNA methylation is commonly measured using bisulfite sequencing (BS-seq). The quality of a BS-seq library is measured by its bisulfite conversion efficiency. Libraries with low conversion rates are typically excluded from analysis resulting in reduced coverage and increased costs.Results: We have developed a probabilistic method and software, LuxRep, that implements a general linear model and simultaneously accounts for technical replicates (libraries from the same biological sample) from different bisulfite-converted DNA libraries. Using simulations and actual DNA methylation data, we show that including technical replicates with low bisulfite conversion rates generates more accurate estimates of methylation levels and differentially methylated sites. Moreover, using variational inference speeds up computation time necessary for whole genome analysis.Conclusions: In this work we show that taking into account technical replicates (i.e. libraries) of BS-seq data of varying bisulfite conversion rates, with their corresponding experimental parameters, improves methylation level estimation and differential methylation detection.</p

    Permutation-based significance analysis reduces the type 1 error rate in bisulfite sequencing data analysis of human umbilical cord blood samples

    Get PDF
    DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.Peer reviewe

    Permutation-based significance analysis reduces the type 1 error rate in bisulfite sequencing data analysis of human umbilical cord blood samples

    Get PDF
    DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.</p

    Umbilical cord blood DNA methylation in children who later develop type 1 diabetes

    Get PDF
    Aims/hypothesis Distinct DNA methylation patterns have recently been observed to precede type 1 diabetes in whole blood collected from young children. Our aim was to determine whether perinatal DNA methylation is associated with later progression to type 1 diabetes. Methods Reduced representation bisulphite sequencing (RRBS) analysis was performed on umbilical cord blood samples collected within the Finnish Type 1 Diabetes Prediction and Prevention (DIPP) Study. Children later diagnosed with type 1 diabetes and/or who tested positive for multiple islet autoantibodies (n = 43) were compared with control individuals (n = 79) who remained autoantibody-negative throughout the DIPP follow-up until 15 years of age. Potential confounding factors related to the pregnancy and the mother were included in the analysis. Results No differences in the umbilical cord blood methylation patterns were observed between the cases and controls at a false discovery rate Conclusions/interpretation Based on our results, differences between children who progress to type 1 diabetes and those who remain healthy throughout childhood are not yet present in the perinatal DNA methylome. However, we cannot exclude the possibility that such differences would be found in a larger dataset.Peer reviewe

    Umbilical cord blood DNA methylation in children who later develop type 1 diabetes

    Get PDF
    Aims/hypothesis: Distinct DNA methylation patterns have recently been observed to precede type 1 diabetes in whole blood collected from young children. Our aim was to determine whether perinatal DNA methylation is associated with later progression to type 1 diabetes.Methods: Reduced representation bisulphite sequencing (RRBS) analysis was performed on umbilical cord blood samples collected within the Finnish Type 1 Diabetes Prediction and Prevention (DIPP) Study. Children later diagnosed with type 1 diabetes and/or who tested positive for multiple islet autoantibodies (n = 43) were compared with control individuals (n = 79) who remained autoantibody-negative throughout the DIPP follow-up until 15 years of age. Potential confounding factors related to the pregnancy and the mother were included in the analysis.Results: No differences in the umbilical cord blood methylation patterns were observed between the cases and controls at a false discovery rate Conclusions/interpretation: Based on our results, differences between children who progress to type 1 diabetes and those who remain healthy throughout childhood are not yet present in the perinatal DNA methylome. However, we cannot exclude the possibility that such differences would be found in a larger dataset.</p

    DNA-metylaatiosekvensointidatan probabilistinen mallintaminen

    No full text
    DNA methylation is an epigenetic modification in which methyl groups bind to the DNA molecule. It regulates gene expression and enables the normal function of the cells. On the contrary, aberrant DNA methylation patterns have been associated with diseases such as cancer. Uncovering the mechanisms of gene regulation and utilizing DNA methylation biomarkers in e.g. cancer screening require advanced analysis methods for high-throughput sequencing data. The aim of this thesis is to improve analysis of DNA methylation data with a probabilistic modeling approach. First, two methods for differential DNA methylation analysis designed for bisulfite sequencing data are proposed. In both methods, the spatial correlation of the methylation states is utilized in a binomial generalized linear mixed model to improve the accuracy of detecting differential methylation. The first method assumes that the DNA methylation across all cytosines in a genomic window have the same correlation characteristics and performs testing for differential methylation by computing one Bayes factor for each genomic window. In the other approach a sparsifying prior is used in the correlation structure to allow individual cytosines to deviate from the general correlation pattern. In the third publication, an analysis workflow for reduced representation bisulfite sequencing data is proposed. The workflow was applied to a cord blood data set, and differential DNA methylation analysis was performed to detect possible pregnancy or delivery-related changes in cord blood DNA methylation. In the fourth publication, methods for cell-free DNA-based cancer classification were developed and compared. To demonstrate the feasibility of liquid biopsies in clinical use, lower sequencing depth was simulated by subsampling the used cell-free methylated DNA immunoprecipitation sequencing data set. Then different generalized linear model classifiers and feature extraction and selection methods were applied and the resulting classification performance was evaluated. The results presented in this thesis show that probabilistic modeling and Bayesian methods perform well and can improve the accuracy of analysis of DNA methylation sequencing data. Taking spatial correlation into account increased the accuracy of differential DNA methylation analysis. Allowing deviations from the correlation pattern made the analysis more flexible. Most of the differentially methylated cytosines and regions found from the cord-blood data set were sex-associated, and only a few were associated with the other clinical covariates. Additionally, the cord-blood data analysis revealed the problem of inflated p-values and a permutation-based method for solving the issue was proposed. Finally, methods that improved cell-free DNA methylation-based cancer classification included a logistic regression classifier and iterative supervised principal component analysis and Fisher's exact test for feature selection.DNA-metylaatio on epigeneettinen muutos, jossa metyyliryhmiä kiinnittyy DNA-molekyyliin. Se sääntelee geenien ilmentymistä ja mahdollistaa solujen normaalin toiminnan. Poikkeamat DNA-metylaatiotiloissa on toisaalta voitu yhdistää sairauksiin kuten syöpiin. Geenien sääntelymekanismien ymmärtäminen ja DNA-metylaatiobiomarkkerien hyödyntäminen esimerkiksi syöpäseulonnoissa vaativat edistyneitä menetelmiä sekvensointidatan analysointiin. Tämän väitöskirjan tavoite on parantaa DNA-metylaatiodatan analysointia probabilistisella mallinnustavalla. Väitöskirjan kaksi ensimmäistä julkaisua esittelevät kumpikin bisulfiittisekvensointidatalle tarkoitetun työkalun differentiaalista DNA-metylaatioanalyysia varten. Molemmissa hyödynnetään spatiaalista korrelaatiota yleistetyssä lineaarisessa sekamallissa differentiaalisen metylaation havaitsemistarkkuuden parantamiseksi. Ensimmäinen menetelmä olettaa kuhunkin genomi-ikkunaan kuuluvien sytosiinien metylaatiotilojen olevan keskenään korreloituneita. Differentiaalisen metylaation testaus tehdään laskemalla yksi Bayes-tekijä kutakin ikkunaa kohti. Toisessa työkalussa korrelaatiomatriisin määrittelyssä käytetään harvuutta tukevaa prioria, mikä sallii yksittäisten sytosiinien poikkeamisen yleisestä korrelaatiorakenteesta. Kolmannessa julkaisussa esitellään työnkulku RRBS-datan analysointia varten. Sitä käytettiin napaveriaineistoon, josta etsittiin raskauteen tai synnytykseen liittyviä DNA-metylaatiomuutoksia differentiaalisella metylaatioanalyysillä. Neljännessä julkaisussa vertailtiin menetelmiä soluvapaaseen DNA:han perustuvaan luokitteluun. Nestebiopsioiden soveltuvuutta kliiniseen käyttöön havainnollistettiin simuloimalla alempaa sekvensointisyvyyttä alinäytteistämällä käytetty soluvapaa MeDIP-seq-aineisto. Erilaisia yleistettyihin lineaarisiin malleihin perustuvia luokittelijoita ja menetelmiä piirteiden valintaan sovellettiin alinäytteistettyyn aineistoon ja menetelmien luokittelukyky mitattiin. Väitöskirjassa esitetyt tulokset osoittavat, että probabilistiset mallinnusmenetelmät ja bayesiläiset menetelmät toimivat hyvin ja voivat parantaa DNA-metylaatiosekvensointidatan analyysien tarkkuutta. Spatiaalisen korrelaation ottaminen huomioon paransi differentiaalisen DNA-metylaatioanalyysin tarkkuutta. Sallimalla sytosiinien poikkeamisen korrelaatiorakenteesta analyysistä tuli joustavampi. Napaveriaineistosta löydetyistä differentiaalisesti metyloituneista sytosiineista suurin osa liittyi sukupuoleen ja vain muutama liittyi muihin kliinisiin muuttujiin. Lisäksi napaveriaineiston analyysi paljasti p-arvojen inflaatioon liittyvän ongelman, jonka ratkaisemiseksi esitettiin empiirinen menetelmä. Soluvapaaseen DNA-metylaatioon perustuvaa luokittelua paransivat yksinkertainen bayesiläinen logistinen regressioluokittelija sekä piirteiden valinnassa iteratiivinen ohjattu pääkomponenttianalyysi ja Fisherin tarkka testi

    Probabilistinen menetelmä kromatiini-interaktioiden määrittämiseen

    No full text
    Chromatin interactions have an important role in transcription regulation and therefore they can affect the function of the whole cell and the organism. To study chromatin interactions for better understanding of gene regulation, a method called Chromosome Interaction Analysis using Paired End Tags (ChIA-PET) has been developed. ChIA-PET is a high-resolution next-generation sequencing method for finding chromatin interactions which involve a protein of interest. ChIA-PET experiments give a list of putative interactions between two chromatin sites as a result. There are several experimental laboratory steps in ChIA-PET protocol which induce high level of background noise. The aim of this thesis is to construct a statistical model for identifying the true interactions from ChIA-PET interaction count data. First, the current methods for solving this task are reviewed. Then a new method combining a Bayesian mixture model with bias removal by Poisson regression is proposed. The model parameters are estimated by using Markov chain Monte Carlo methods. The new model is implemented on Matlab and tested on real ChIA-PET data sets. The results suggest that the proposed mixture model can quantify chromatin interactions and make good use of incorporated bias correcting. Comparison with two other methods, ChIA-PET Tool and Mango, shows that the mixture model results are partially the same as for the other two methods but there also also some interactions only found by the mixture model. Annotation analysis revealed that the mixture model results are in line with earlier research results.Kromatiini-interaktiot ovat tärkeä tekijä geenien sääntelyssä ja tätä kautta koko solun ja eliön toiminnassa. Kromatiinin muodostamat silmukat tuovat transkription käynnistävät tekijät toistensa lähelle ja näin mahdollistavat proteiinien rakentamisen. Kromatiini-interaktioiden tutkimiseen on kehitetty erilaisia NGS-menetelmiä, joista yksi on Chromatin interaction analysis using paired end tags eli ChIA-PET. Tässä menetelmässä kromatiinisilmukat lukitaan paikoilleen ja pilkotaan niin, että lopputuloksena on lista yhdessä esiintyneistä DNA:n kohdista. ChIA-PET tyyppinen data sisältää kuitenkin oikeiden interaktioiden lisäksi runsaasti satunnaisesti toisiinsa kiinnittyneitä pätkiä, jotka tulisi erottaa oikeista havainnoista. Tämä työ esittelee jo olemassa olevat menetelmät tämän ongelman ratkaisemiseen. Sen jälkeen esitellään uusi menetelmä interaktioiden luokitteluun. Uusi menetelmä yhdistää bayeslaisen mikstuurimallin ja Poisson regression virhelähteiden poistoon. Mallin parametrien estimointiin käytetään bayeslaista analyysiä ja Markov Chain Monte Carlo -menetelmiä. Mallin toteutus tehtiin Matlabilla ja sitä testattiin ChIA-PET-aineistoon. Tulokset osoittavat, että mikstuurimalli pystyy erottelemaan kromatiini-interaktioita käyttäen hyväksi virhelähteiden korjausta. Vertailtaessa tuloksia ChIA-PET Tool ja Mango-ohjelmistojen kanssa huomataan, että mikstuurimalli löytää osaksi samoja ja osaksi eri interaktioita. Annotaatioanalyysin perusteella mikstuurimallin tulokset ovat linjassa aiempien tutkimustuloksien kanssa

    Probabilistic modeling methods for cell-free DNA methylation based cancer classification

    No full text
    Background cfMeDIP-seq is a low-cost method for determining the DNA methylation status of cell-free DNA and it has been successfully combined with statistical methods for accurate cancer diagnostics. We investigate the diagnostic classification aspect by applying statistical tests and dimension reduction techniques for feature selection and probabilistic modeling for the cancer type classification, and we also study the effect of sequencing depth. Methods We experiment with a variety of statistical methods that use different feature selection and feature extraction methods as well as probabilistic classifiers for diagnostic decision making. We test the (moderated) t-tests and the Fisher’s exact test for feature selection, principal component analysis (PCA) as well as iterative supervised PCA (ISPCA) for feature generation, and GLMnet and logistic regression methods with sparsity promoting priors for classification. Probabilistic programming language Stan is used to implement Bayesian inference for the probabilistic models. Results and conclusions We compare overlaps of differentially methylated genomic regions as chosen by different feature selection methods, and evaluate probabilistic classifiers by evaluating the area under the receiver operating characteristic scores on discovery and validation cohorts. While we observe that many methods perform equally well as, and occasionally considerably better than, GLMnet that was originally proposed for cfMeDIP-seq based cancer classification, we also observed that performance of different methods vary across sequencing depths, cancer types and study cohorts. Overall, methods that seem robust and promising include Fisher’s exact test and ISPCA for feature selection as well as a simple logistic regression model with the number of hyper and hypo-methylated regions as features.Peer reviewe

    LuxRep

    No full text
    Funding Information: We acknowledge the computational resources provided by the Aalto Science-IT project and the Finnish Functional Genomics Centre and Biocenter Finland. Funding Information: This work was supported by the Academy of Finland (292660, 311584, 335436). The funding body played no role in the design of the study, the collection, analysis, interpretation of data, or in writing the manuscript. Publisher Copyright: © 2022, The Author(s).Background: DNA methylation is commonly measured using bisulfite sequencing (BS-seq). The quality of a BS-seq library is measured by its bisulfite conversion efficiency. Libraries with low conversion rates are typically excluded from analysis resulting in reduced coverage and increased costs. Results: We have developed a probabilistic method and software, LuxRep, that implements a general linear model and simultaneously accounts for technical replicates (libraries from the same biological sample) from different bisulfite-converted DNA libraries. Using simulations and actual DNA methylation data, we show that including technical replicates with low bisulfite conversion rates generates more accurate estimates of methylation levels and differentially methylated sites. Moreover, using variational inference speeds up computation time necessary for whole genome analysis. Conclusions: In this work we show that taking into account technical replicates (i.e. libraries) of BS-seq data of varying bisulfite conversion rates, with their corresponding experimental parameters, improves methylation level estimation and differential methylation detection.Peer reviewe
    corecore