    Applications of Bayesian computational statistics and modeling to large-scale geoscientific problems

    Climate change is one of the most important, pressing, and furthest reaching global challenges that humanity faces in the 21st century. Already affecting daily lives of many directly and everyone indirectly, changes in climate are projected to have many catastrophic consequences. For this reason, researching climate and climate change is needed. Studying complex geoscientific phenomena such as climate change consists of a patchwork of challenging mathematical, statistical, and computational problems. To solve these problems, local and global process models and statistical models are combined with both small in situ observation data sets with only few observations, and equally well with enormous global remote sensing data products containing hundreds of millions of data points. This integration of models and data can be done in a Bayesian inverse modeling setting if the algorithms and computational methods used are chosen and implemented carefully. The methods used in the four publications on which this thesis is based range from high-dimensional Bayesian spatial statistical models and Markov chain Monte Carlo methods to time series modeling and point estimation via optimization. The particular geoscientific problems considered are: finding the spatio-temporal distribution of atmospheric carbon dioxide based on sparse remote sensing data, quantifying uncertainties in modeling methane emissions from boreal wetlands, analyzing and quantifying the effect of climate change on growing season in the boreal region, and using statistical methods to calibrate a terrestrial ecosystem model. In addition to analyzing these problems, the research and the results help to understand model performance and how modeling uncertainties in very large computational problems can be approached, also providing algorithm implementations on top of which future efforts may be built.Ilmastonmuutos on yksi suurimmista globaaleista haasteista, jonka kanssa ihmiskunta joutuu painiskelemaan kahdennellakymmenelläensimmäisellä vuosisadalla. Jo tänä päivänä sen vaikutukset yltävät ihmisten jokapäiväiseen elämään kaikkialla maapallolla. Ilmastonmuutoksen, kuten muidenkin monimutkaisten maapalloon liittyvien ilmiöiden, tutkiminen koostuu matemaattisten ja tietokonemallien sekä systeemisen tietämyksen järjestelmällisestä yhdistämisestä ja analysoinnista. Tässä työssä yhdistellään Bayesilaisen tilastotieteen keinoin satojen miljoonien datapisteiden satelliittimittausjoukkoja tilastollisiin malleihin, sekä pienempiä paikallisia aikasarjoja globaaleihin ilmastomalleihin. Käytettyjä metodeja on lukuisia, hierarkkisista malleista Markov chain Monte Carlo algoritmeihin ja Gaussisiin prosesseihin. Työssä käsitellyt ongelmat ovat: globaalin aikariippuvan ilmakehän hiilidioksidijakauman selvittäminen kaukokartoitushavaintoihin perustuen, pohjoisten suoalueiden metaaniemissioiden epävarmuuden määrittäminen, ilmastonmuutoksen aiheuttaman kasvukauden aikaistumisen analysointi, sekä ekosysteemimallin maakomponentin osittainen kalibrointi. Yllämainittujen ongelmien tutkimisen lisäksi työ auttaa selvittämään miten epävarmuuden analysointi raskaissa tietokonemalleissa on algoritmisesti tehtävissä sekä kuinka valtavia datamääriä voi hyödyntää epävarmuuden arviointiin. Työssä esitetty algoritmien kehitystyö luonnollisesti myös mahdollistaa menetelmien jalostamisen vastaisuudessa niin muihin tarkoituksiin, kuin uusiksi, kehittyneemmiksi menetelmiksi

    Evaluating the accuracy of Gaussian approximations in VSWIR imaging spectroscopy retrievals

    The joint retrieval of surface reflectances and atmospheric parameters in VSWIR imaging spectroscopy is a computationally challenging high-dimensional problem. Using NASA's Surface Biology and Geology mission as the motivational context, the uncertainty associated with the retrievals is crucial for further application of the retrieved results for environmental applications. Although Markov chain Monte Carlo (MCMC) is a Bayesian method ideal for uncertainty quantification, the full-dimensional implementation of MCMC for the retrieval is computationally intractable. In this work, we developed a block Metropolis MCMC algorithm for the high-dimensional VSWIR surface reflectance retrieval that leverages the structure of the forward radiative transfer model to enable tractable fully Bayesian computation. We use the posterior distribution from this MCMC algorithm to assess the limitations of optimal estimation, the state-of-the-art Bayesian algorithm in operational retrievals which is more computationally efficient but uses a Gaussian approximation to characterize the posterior. Analyzing the differences in the posterior computed by each method, the MCMC algorithm was shown to give more physically sensible results and reveals the non-Gaussian structure of the posterior, specifically in the atmospheric aerosol optical depth parameter and the low-wavelength surface reflectances

    Calibrating the sqHIMMELI v1.0 wetland methane emission model with hierarchical modeling and adaptive MCMC

    Estimating methane (CH4) emissions from natural wetlands is complex, and the estimates contain large uncertainties. The models used for the task are typically heavily parameterized and the parameter values are not well known. In this study, we perform a Bayesian model calibration for a new wetland CH4 emission model to improve the quality of the predictions and to understand the limitations of such models. The detailed process model that we analyze contains descriptions for CH4 production from anaerobic respiration, CH4 oxidation, and gas transportation by diffusion, ebullition, and the aerenchyma cells of vascular plants. The processes are controlled by several tunable parameters. We use a hierarchical statistical model to describe the parameters and obtain the posterior distributions of the parameters and uncertainties in the processes with adaptive Markov chain Monte Carlo (MCMC), importance resampling, and time series analysis techniques. For the estimation, the analysis utilizes measurement data from the Siikaneva flux measurement site in southern Finland. The uncertainties related to the parameters and the modeled processes are described quantitatively. At the process level, the flux measurement data are able to constrain the CH4 production processes, methane oxidation, and the different gas transport processes. The posterior covariance structures explain how the parameters and the processes are related. Additionally, the flux and flux component uncertain-ties are analyzed both at the annual and daily levels. The parameter posterior densities obtained provide information regarding importance of the different processes, which is also useful for development of wetland methane emission models other than the square root HelsinkI Model of MEthane buiLd- up and emIssion for peatlands (sqHIMMELI). The hierarchical modeling allows us to assess the effects of some of the parameters on an annual basis. The results of the calibration and the cross validation suggest that the early spring net primary production could be used to predict parameters affecting the annual methane production. Even though the calibration is specific to the Siikaneva site, the hierarchical modeling approach is well suited for larger-scale studies and the results of the estimation pave way for a regional or global- scale Bayesian calibration of wetland emission models.Peer reviewe

    Constraining ecosystem model with adaptive Metropolis algorithm using boreal forest site eddy covariance measurements

    We examined parameter optimisation in the JSBACH (Kaminski et al., 2013; Knorr and Kattge, 2005; Reick et al., 2013) ecosystem model, applied to two boreal forest sites (Hyytiala and Sodankyla) in Finland. We identified and tested key parameters in soil hydrology and forest water and carbon-exchange-related formulations, and optimised them using the adaptive Metropolis (AM) algorithm for Hyytil with a 5-year calibration period (2000-2004) followed by a 4-year validation period (2005-2008). Sodankyla acted as an independent validation site, where optimisations were not made. The tuning provided estimates for full distribution of possible parameters, along with information about correlation, sensitivity and identifiability. Some parameters were correlated with each other due to a phenomenological connection between carbon uptake and water stress or other connections due to the set-up of the model formulations. The latter holds especially for vegetation phenology parameters. The least identifiable parameters include phenology parameters, parameters connecting relative humidity and soil dryness, and the field capacity of the skin reservoir. These soil parameters were masked by the large contribution from vegetation transpiration. In addition to leaf area index and the maximum carboxylation rate, the most effective parameters adjusting the gross primary production (GPP) and evapotranspiration (ET) fluxes in seasonal tuning were related to soil wilting point, drainage and moisture stress imposed on vegetation. For daily and half-hourly tunings the most important parameters were the ratio of leaf internal CO2 concentration to external CO2 and the parameter connecting relative humidity and soil dryness. Effectively the seasonal tuning transferred water from soil moisture into ET, and daily and half-hourly tunings reversed this process. The seasonal tuning improved the month-to-month development of GPP and ET, and produced the most stable estimates of water use efficiency. When compared to the seasonal tuning, the daily tuning is worse on the seasonal scale. However, daily parametrisation reproduced the observations for average diurnal cycle best, except for the GPP for Sodankyla validation period, where half-hourly tuned parameters were better. In general, the daily tuning provided the largest reduction in model-data mismatch. The models response to drought was unaffected by our parametrisations and further studies are needed into enhancing the dry response in JSBACH.Peer reviewe

    Early snowmelt significantly enhances boreal springtime carbon uptake

    We determine the annual timing of spring recovery from space-borne microwave radiometer observations across northern hemisphere boreal evergreen forests for 1979-2014. We find a trend of advanced spring recovery of carbon uptake for this period, with a total average shift of 8.1 d (2.3 d/decade). We use this trend to estimate the corresponding changes in gross primary production (GPP) by applying in situ carbon flux observations. Micrometeoro-logical CO2 measurements at four sites in northern Europe and North America indicate that such an advance in spring recovery would have increased the January-June GPP sum by 29 g.C.m(-2) [8.4 g.C.m(-2) (3.7%)/decade]. We find this sensitivity of the measured springtime GPP to the spring recovery to be in accordance with the corresponding sensitivity derived from simulations with a land ecosystem model coupled to a global circulation model. The model-predicted increase in springtime cumulative GPP was 0.035 Pg/decade [15.5 g.C.m(-2) (6.8%)/decade] for Eurasian forests and 0.017 Pg/decade for forests in North America [9.8 g.C.m(-2) (4.4%)/decade]. This change in the springtime sum of GPP related to the timing of spring snowmelt is quantified here for boreal evergreen forests.Peer reviewe

    Parameter calibration and stomatal conductance formulation comparison for boreal forests with adaptive population importance sampler in the land surface model JSBACH

    We calibrated the JSBACH model with six different stomatal conductance formulations using measurements from 10 FLUXNET coniferous evergreen sites in the boreal zone. The parameter posterior distributions were generated by the adaptive population importance sampler (APIS); then the optimal values were estimated by a simple stochastic optimisation algorithm. The model was constrained with in situ observations of evapotranspiration (ET) and gross primary production (GPP). We identified the key parameters in the calibration process. These parameters control the soil moisture stress function and the overall rate of carbon fixation. The JSBACH model was also modified to use a delayed effect of temperature for photosynthetic activity in spring. This modification enabled the model to correctly reproduce the springtime increase in GPP for all conifer sites used in this study. Overall, the calibration and model modifications improved the coefficient of determination and the model bias for GPP with all stomatal conductance formulations. However, only the coefficient of determination was clearly improved for ET. The optimisation resulted in best performance by the Bethy, Ball-Berry, and the Friend and Kiang stomatal conductance models. We also optimised the model during a drought event at a Finnish Scots pine forest site. This optimisation improved the model behaviour but resulted in significant changes to the parameter values except for the unified stomatal optimisation model (USO). Interestingly, the USO demonstrated the best performance during this event.Peer reviewe

    HIMMELI v1.0 : HelsinkI Model of MEthane buiLd-up and emIssion for peatlands

    Wetlands are one of the most significant natural sources of methane (CH4) to the atmosphere. They emit CH4 because decomposition of soil organic matter in waterlogged anoxic conditions produces CH4, in addition to carbon dioxide (CO2). Production of CH4 and how much of it escapes to the atmosphere depend on a multitude of environmental drivers. Models simulating the processes leading to CH4 emissions are thus needed for upscaling observations to estimate present CH4 emissions and for producing scenarios of future atmospheric CH4 concentrations. Aiming at a CH4 model that can be added to models describing peatland carbon cycling, we composed a model called HIMMELI that describes CH4 build-up in and emissions from peatland soils. It is not a full peatland carbon cycle model but it requires the rate of anoxic soil respiration as input. Driven by soil temperature, leaf area index (LAI) of aerenchymatous peat-land vegetation, and water table depth (WTD), it simulates the concentrations and transport of CH4, CO2, and oxygen (O-2) in a layered one-dimensional peat column. Here, we present the HIMMELI model structure and results of tests on the model sensitivity to the input data and to the description of the peat column (peat depth and layer thickness), and demonstrate that HIMMELI outputs realistic fluxes by comparing modeled and measured fluxes at two peatland sites. As HIMMELI describes only the CH4-related processes, not the full carbon cycle, our analysis revealed mechanisms and dependencies that may remain hidden when testing CH4 models connected to complete peatland carbon models, which is usually the case. Our results indicated that (1) the model is flexible and robust and thus suitable for different environments; (2) the simulated CH4 emissions largely depend on the prescribed rate of anoxic respiration; (3) the sensitivity of the total CH4 emission to other input variables is mainly mediated via the concentrations of dissolved gases, in particular, the O-2 concentrations that affect the CH4 production and oxidation rates; (4) with given input respiration, the peat column description does not significantly affect the simulated CH4 emissions in this model version.Peer reviewe