Search CORE

19 research outputs found

Statistical post-processing of hydrological forecasts using Bayesian model averaging

Author: Ayari Mehrez El
Baran Sándor
Hemri Stephan
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 29/08/2018
Field of study

Accurate and reliable probabilistic forecasts of hydrological quantities like runoff or water level are beneficial to various areas of society. Probabilistic state-of-the-art hydrological ensemble prediction models are usually driven with meteorological ensemble forecasts. Hence, biases and dispersion errors of the meteorological forecasts cascade down to the hydrological predictions and add to the errors of the hydrological models. The systematic parts of these errors can be reduced by applying statistical post-processing. For a sound estimation of predictive uncertainty and an optimal correction of systematic errors, statistical post-processing methods should be tailored to the particular forecast variable at hand. Former studies have shown that it can make sense to treat hydrological quantities as bounded variables. In this paper, a doubly truncated Bayesian model averaging (BMA) method, which allows for flexible post-processing of (multi-model) ensemble forecasts of water level, is introduced. A case study based on water level for a gauge of river Rhine, reveals a good predictive skill of doubly truncated BMA compared both to the raw ensemble and the reference ensemble model output statistics approach.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

University of Debrecen Electronic Archive

Data augmentation for models based on rejection sampling

Author: Dunson David
Lin Lizhen
Rao Vinayak
Publication venue
Publication date: 03/08/2015
Field of study

We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea, which seems to be missing in the literature, is a simple scheme to instantiate the rejected proposals preceding each data point. The resulting joint probability over observed and rejected variables can be much simpler than the marginal distribution over the observed variables, which often involves intractable integrals. We consider three problems, the first being the modeling of flow-cytometry measurements subject to truncation. The second is a Bayesian analysis of the matrix Langevin distribution on the Stiefel manifold, and the third, Bayesian inference for a nonparametric Gaussian process density model. The latter two are instances of problems where Markov chain Monte Carlo inference is doubly-intractable. Our experiments demonstrate superior performance over state-of-the-art sampling algorithms for such problems.Comment: 6 figures. arXiv admin note: text overlap with arXiv:1311.090

arXiv.org e-Print Archive

CiteSeerX

Evaluation of microarray-based DNA methylation measurement using technical replicates: the Atherosclerosis Risk In Communities (ARIC) Study

Author: Boerwinkle Eric
Bose Maitreyee
Bressler Jan
Demerath Ellen W.
Fornage Myriam
Grove Megan L.
Guan Weihua
Hicks Chindo
Kao Wen Hong
Mosley Thomas H.
North Kari
Pankow James S.
Wu Chong
Zhang Yu
Publication venue: DigitalCommons@CSB/SJU
Publication date: 19/09/2014
Field of study

Background: DNA methylation is a widely studied epigenetic phenomenon; alterations in methylation patterns influence human phenotypes and risk of disease. As part of the Atherosclerosis Risk in Communities (ARIC) study, the Illumina Infinium HumanMethylation450 (HM450) BeadChip was used to measure DNA methylation in peripheral blood obtained from ~3000 African American study participants. Over 480,000 cytosine-guanine (CpG) dinucleotide sites were surveyed on the HM450 BeadChip. To evaluate the impact of technical variation, 265 technical replicates from 130 participants were included in the study. Results: For each CpG site, we calculated the intraclass correlation coefficient (ICC) to compare variation of methylation levels within- and between-replicate pairs, ranging between 0 and 1. We modeled the distribution of ICC as a mixture of censored or truncated normal and normal distributions using an EM algorithm. The CpG sites were clustered into low- and high-reliability groups, according to the calculated posterior probabilities. We also demonstrated the performance of this clustering when applied to a study of association between methylation levels and smoking status of individuals. For the CpG sites showing genome-wide significant association with smoking status, most (~96%) were seen from sites in the high reliability cluster. Conclusions: We suggest that CpG sites with low ICC may be excluded from subsequent association analyses, or extra caution needs to be taken for associations at such sites

College of Saint Benedict and Saint John’s University: DigitalCommons@CSB/SJU