81 research outputs found

    Speeding up the inference in Gaussian process models

    Get PDF
    In this dissertation Gaussian processes are used to define prior distributions over latent functions in hierarchical Bayesian models. Gaussian process is a non-parametric model with which one does not need to fix the functional form of the latent function, but its properties can be defined implicitly. These implicit statements are encoded in the mean and covariance function, which determine, for example, the smoothness and variability of the function. This non-parametric nature of the Gaussian process gives rise to a flexible and diverse class of probabilistic models. There are two main challenges with using Gaussian processes. Their main complication is the computational time which increases rapidly as a function of a number of data points. Other challenge is the analytically intractable inference, which exacerbates the slow computational time. This dissertation considers methods to alleviate these problems. The inference problem is attacked with approximative methods. The Laplace approximation and expectation propagation algorithm are utilized to give Gaussian approximation to the conditional posterior distribution of the latent function given the hyperparameters. The integration over hyperparameters is performed using a Monte Carlo, a grid based, or a central composite design integration. Markov chain Monte Carlo methods over all unknown parameters are used as a golden standard to which the other methods are compared. The rapidly increasing computational time is cured with sparse approximations to Gaussian process and compactly supported covariance functions. These are both analyzed in detail and tested in experiments. Practical details on their implementation with the approximative inference techniques are discussed. The techniques for speeding up the inference are tested in three modeling problems. The problems considered are disease mapping, regression and classification. The disease mapping and regression problems are tackled with standard and robust observation models. The results show that the techniques presented speed up the inference considerably without compromising the accuracy severely

    Laplace approximation and natural gradient for Gaussian process regression with heteroscedastic Student-t model

    Get PDF
    We propose the Laplace method to derive approximate inference for Gaussian process (GP) regression in the location and scale parameters of the student-t probabilistic model. This allows both mean and variance of data to vary as a function of covariates with the attractive feature that the student-t model has been widely used as a useful tool for robustifying data analysis. The challenge in the approximate inference for the model, lies in the analytical intractability of the posterior distribution and the lack of concavity of the log-likelihood function. We present the natural gradient adaptation for the estimation process which primarily relies on the property that the student-t model naturally has orthogonal parametrization. Due to this particular property of the model the Laplace approximation becomes significantly more robust than the traditional approach using Newton’s methods. We also introduce an alternative Laplace approximation by using model’s Fisher information matrix. According to experiments this alternative approximation provides very similar posterior approximations and predictive performance to the traditional Laplace approximation with model’s Hessian matrix. However, the proposed Laplace–Fisher approximation is faster and more stable to calculate compared to the traditional Laplace approximation. We also compare both of these Laplace approximations with the Markov chain Monte Carlo (MCMC) method. We discuss how our approach can, in general, improve the inference algorithm in cases where the probabilistic model assumed for the data is not log-concave.Peer reviewe

    Hierarkkinen Bayes-malli osoittaa Arktisten merinisäkkäiden levinneisyyksien muutokset

    Get PDF
    Kehitimme menetelmän erilaisten avointen aineistojen, kuten julkaistujen artikkelien ja tietokantojen, käyttämiseen analysoidaksemme arktisten merinisäkkäiden levinneisyyksiä. Menetelmän avulla arvioimme ympäristön vaikutusta lajien levinneisyydelle sekä levinneisyyksien mahdollisia muutoksia havaittujen ympäristömuutosten seurauksena. Tutkimus toteutettiin Karan Merellä, joka on yksi arktisista reunameristä. Etsimme avoimista aineistosta havaintotietoja jääkarhuista (Ursus maritimus), mursuista (Odobenus rosmarus rosmarus) ja norpista (Phoca hispida). Paikansimme havainnot ja analysoimme lajien esiintymistiheytta Poissonin pisteprosessimallilla. Vaihtelevan laatuinen aineisto ei sisältänyt tietoa lähdetutkimusten havaintoprosessista, minkä vuoksi pystyimme mallintamaan ainoastaan lajien esiintymistiheyden suhteessa tuntemattomaan havaintointensiteettiin. Selitimme lajien suhteellista esiintymistiheyttä ympäristömuuttujilla ja satunnaismuuttujilla, joista jälkimmäiset kuvastavat esiintymistiheyden satunnaisvaihtelua ajassa ja tilassa sekä suhteessa satunnaiseen havainnointi-intensiteettiin. Jääkarhujen esiintymistiheyttä selitimme myös norppien ennustetulla esiintymistiheydellä jääkarhujen havaintopisteissä. Merijään tiheys ja havaintojen etäisyys rannikosta olivat tärkeimpiä selittäviä ympäristömuuttujia jokaisen lajin kohdalla. Hylkeiden esiintymistiheys oli tärkein muuttuja selittämään jääkarhujen esiintymistiheyttä. Merijään tiheyden heikkeneminen 17-vuotisen tutkimusjakson aikana vaikutti lajien esiintymistiheyksiin siten, että mursujen ja jääkarhujen tiheys pysyi vakaana tai heikkeni hieman, kun taas norppien tiheys laski itäisellä ja kasvoi läntisellä Karan Merellä. Pisteprosessimalli on vakaa menetelmä lajien levinneisyyksien arvioimiseen perustuen vaihtelevan laatuisiin havaintoihin. Menetelmä tarjoaa sijainnista riippumattomasti luotettavaa tietoa ekosysteemeistä ja tarjoaa työkaluja suojelutoimintaan arktisella. Tuloksemme osoittivat, että yksinkertaisessa ravintoverkossa saalistajalajin levinneisyys selittyy paremmin saalislajien levinneisyydellä kuin ympäristömuuttujilla. Heikkenevä merijää on ilmeinen syy merinisäkkäiden levinneisyyksien muutoksiin arktisella alueella.Aim Our aim involved developing a method to analyse spatiotemporal distributions of Arctic marine mammals (AMMs) using heterogeneous open source data, such as scientific papers and open repositories. Another aim was to quantitatively estimate the effects of environmental covariates on AMMs’ distributions and to analyse whether their distributions have shifted along with environmental changes. Location Arctic shelf area. The Kara Sea. Methods Our literature search focused on survey data regarding polar bears (Ursus maritimus), Atlantic walruses (Odobenus rosmarus rosmarus) and ringed seals (Phoca hispida). We mapped the data on a grid and built a hierarchical Poisson point process model to analyse species’ densities. The heterogeneous data lacked information on survey intensity and we could model only the relative density of each species. We explained relative densities with environmental covariates and random effects reflecting excess spatiotemporal variation and the unknown, varying sampling effort. The relative density of polar bears was explained also by the relative density of seals. Results The most important covariates explaining AMMs’ relative densities were ice concentration and distance to the coast, and regarding polar bears, also the relative density of seals. The results suggest that due to the decrease in the average ice concentration, the relative densities of polar bears and walruses slightly decreased or stayed constant during the 17‐year‐long study period, whereas seals shifted their distribution from the Eastern to the Western Kara Sea. Main conclusions Point process modelling is a robust methodology to estimate distributions from heterogeneous observations, providing spatially explicit information about ecosystems and thus serves advances for conservation efforts in the Arctic. In a simple trophic system, a distribution model of a top predator benefits from utilizing prey species’ distributions compared to a solely environmental model. The decreasing ice cover seems to have led to changes in AMMs’ distributions in the marginal Arctic region.Peer reviewe

    Bayesian model based spatiotemporal survey designs and partially observed log Gaussian Cox process

    Get PDF
    In geostatistics, the spatiotemporal design for data collection is central for accurate prediction and parameter inference. An important class of geostatistical models is log-Gaussian Cox process (LGCP) but there are no formal analyses on spatial or spatiotemporal survey designs for them. In this work, we study traditional balanced and uniform random designs in situations where analyst has prior information on intensity function of LGCP and show that the traditional balanced and random designs are not efficient in such situations. We also propose a new design sampling method, a rejection sampling design, which extends the traditional balanced and random designs by directing survey sites to locations that are a priori expected to provide most information. We compare our proposal to the traditional balanced and uniform random designs using the expected average predictive variance (APV) loss and the expected Kullback-Leibler (KL) divergence between the prior and the posterior for the LGCP intensity function in simulation experiments and in a real world case study. The APV informs about expected accuracy of a survey design in point-wise predictions and the KL-divergence measures the expected gain in information about the joint distribution of the intensity field. The case study concerns planning a survey design for analyzing larval areas of two commercially important fish stocks on Finnish coastal region. Our experiments show that the designs generated by the proposed rejection sampling method clearly outperform the traditional balanced and uniform random survey designs. Moreover, the method is easily applicable to other models in general. (C) 2019 The Author(s). Published by Elsevier B.V.Peer reviewe

    Diatoms, filamentous algae and macrovegetation distribution modeling in Gulf of Bothnia

    Get PDF
    We have evaluated the distribution and extent of sea bottom vegetation divided in three groups: Diatoms, macrovegetation and filamentous algae in the Gulf of Bothnia, northernmost area of the Baltic Sea, and relate the increment in the distribution of the filamentous algae with the increasing problem of the eutrophication. The distribution modeling of these groups of species has been done by combining data from species abundance (distribution data) with GIS environmental raster variables based of environmental information in a binomial model to predict the spatial probability of each group of species using MatLab and the GPstuff toolbox. From all the variables used the most important ones were the bottom type and variables related to the exposure of an area (weighted fetch, number of islands and distance to shallow waters) to explain the predicted distribution of the group of the species. It is shown that the main group of species in the Gulf of Bothnia is the filamentous algae, with and elevated predicted probability in almost all the Gulf of Bothnia. Preferring hard bottoms like rock or stones and exposed areas, the number of filamentous algae is increasing every year, reducing macrovegetation populations into more protected areas. The number of nutrients and filamentous algae has increased in the last decades. We discuss a relation between evolution of eutrophication and the increase of filamentous algae, which follows the same south to north and west to east gradients, been the south and west more eutrophied. This work aims to be a tool to assess the environmental protection and coastal management of eutrophication by predicting the probability of presence of the different vegetation groups and analysing the relation of these groups with the eutrophication

    Sparse log Gaussian process in spatial epidemiology

    Get PDF
    Tässä diplomityössä esitetään hierarkinen Bayesilainen malli tautikartoituksen avuksi. Tautikartoitus on spatiaalisen epidemiologian osa-alue, jonka tavoitteena on tutkia terveysriskin maantieteellistä vaihtelua. Tavoitteena on kuvata taudin jakautumista kartalla ja korostaa alueita, joissa tauti- tai kuolemanriski ovat kohonneita. Tässä työssä käytetään kolmen hierarkiakerroksen mallia tutkimaan kuolleisuusriskin alueellisia vaihteluja kuolleisuusdatasta. Kuolleisuus tietyllä alueella mallinnetaan Poissonin prosessilla, jonka odotusarvo saadaan vakioidun kuolleisuusriskin ja suhteellisen riskin tulona. Kuolleisuusriski vakioidaan taustapopulaation ikä-, sukupuoli- ja koulutustasojakauman avulla. Suhteellisen riskin logaritmille annetaan prioriksi Gaussinen prosessi, joka tasoittaa riskipintaa ja lisää alueiden väliset korrelaatiot malliin. Gaussisen prosessin ongelmaksi muodostuu kovarianssimatriisin inversioon tarvittava aika, jota pienennetään tekemällä Gaussiselle prosessille harva aproksimaatio. Spatiaalisessa epidemiologiassa on tärkeää pystyä määrittämään tautiriskin alueellisen vaihtelun tilastollinen merkittävyys. Jotta mallin epävarmuusestimaateille saataisiin mahdollisimman hyvät arviot suoritetaan mallin parametrien ylitse integrointi Markov ketju Monte Carlo menetelmiä käyttäen. Gaussisen prosessin latenttien muuttujien näytteistämistä nopeutetaan muunnoksella, joka käyttää hyväkseen posteriorijakauman kovarianssin aproksimaatiota. Markov-ketju-näytteistäminen suoritetaan hybrid Monte Carlo -menetelmällä, jonka oleellinen osa on marginaaliuskottavuuden logaritmin gradienttien laskenta. Harvan aproksimaation tapauksessa gradientit lasketaan muodostamatta eksplisiittisesti täyttä kovarianssimatriisia. Työ esittelee latenttien muuttujien muunnoksen ja gradienttien laskennan toteutukset. Täyttä ja harvaa Gaussista prosessia käyttäviä malleja testataan kahteen kuolemansyydataan neljällä eri kovarianssifunktiolla, ja malleja verrataan keskenään käyttäen DIC-informaatiokriteeriä. Kuolemansyydatan analyysin tulokset esitetään kuolemanriskikarttoina.This thesis presents a hierarchical Bayesian model for disease mapping methodology. Disease mapping studies comprise spatial epidemiological methods to summarize the spatial variations in the incidence rate of diseases. The aim is to describe the overall disease distribution on a map and highlight areas of elevated or lowered mortality or morbidity risk. In this work, a three level hierarchical model is build to study the spatial variations in the relative mortality risk in an areally referenced health-care data. The mortality in an area is modeled as a Poisson process with mean intensity surface, which is a product of a standardized expected number of deaths and a relative risk. The expected number of deaths is evaluated using an age, gender and scholarly degree standardization. The logartihm of the relative risk is given a Gaussian process prior, which smoothes the risk surface and includes the spatial correlation between areas in the model. A problem in Gaussian processes is the computational burden of the required covariance matrix inversion. To overcome the computational problem a fully independent conditional sparse approximation is used. In spatial epidemiology it is very important to have good estimates whether the spatial variation is significant. To set a golden standard for the uncertainty estimates, both the hyperparameters and the latent values of Gaussian process are marginalized out using Markov chain Monte Carlo methods. The sampling of the latent values is sped up with transformations taking into account the approximate conditional posterior covariance. The sampling is conducted using hybrid Monte Carlo methods which require the gradients of the logarithm of marginal likelihood. The gradients of the sparse approximation are evaluated without forming the full covariance matrix. The work presents an implementation of the gradients and the transformation of latent values for the sparse approximation. The full and sparse Gaussian models, with four different covariance functions, are applied for two mortality data sets. The models are compared to each others with deviance information criterion and the results of the analysis are presented with maps revealing the relative risk

    Experiences in Bayesian Inference in Baltic Salmon Management

    Get PDF
    We review a success story regarding Bayesian inference in fisheries management in the Baltic Sea. The management of salmon fisheries is currently based on the results of a complex Bayesian population dynamic model, and managers and stakeholders use the probabilities in their discussions. We also discuss the technical and human challenges in using Bayesian modeling to give practical advice to the public and to government officials and suggest future areas in which it can be applied. In particular, large databases in fisheries science offer flexible ways to use hierarchical models to learn the population dynamics parameters for those by-catch species that do not have similar large stock-specific data sets like those that exist for many target species. This information is required if we are to understand the future ecosystem risks of fisheries.Comment: Published in at http://dx.doi.org/10.1214/13-STS431 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore