26 research outputs found

    Space-time statistical analysis of malaria morbidity incidence cases in Ghana: A geostatistical modelling approach

    Get PDF
    Malaria is one of the most prevalent and devastating health problems worldwide. It is a highly endemic disease in Ghana, which poses a major challenge to both the public health and socio-economic development of the country. Major factors accounting for this situation include variability in environmental conditions and lack of prevention services coupled with host of other socio-economic factors. Ghana’s National Malaria Control Programme (NMCP) risk assessment measures have been largely based on household surveys which provided inadequate data for accurate prediction of new incidence cases coupled with frequent incomplete monthly case reports. These raise concerns about annual estimates on the disease burden and also pose serious threats to efficient public health planning including the country’s quest of reducing malaria morbidity and mortality cases by 75% by 2015. In this thesis, both geostatistical space-time models and time series seasonal autoregressive integrated moving average (SARIMA) predictive models have been studied and applied to the monthly malaria morbidity cases from both district and regional health facilities in Ghana. The study sought to explore the spatio-temporal distributions of the malaria morbidity incidence and to account for the potential influence of climate variability, with particular focus on producing monthly spatial maps, delimiting areas with high risk of morbidity. This was achieved by modelling the morbidity cases as incidence rates, being the number of new reported cases per 100,000 residents, which together with the climatic covariates were considered as realisations of random processes occurring in space and/or time. The SARIMA models indicated an upward trend of morbidity incidence in the regions with strong seasonal variation which can be explained primarily by the effects of rainfall, temperature and relative humidity in the month preceding incidence of the disease as well as the morbidity incidence in the previous months. The various spacetime ordinary kriging (STOK) models showed varied spatial and temporal distributions of the morbidity incidence rates, which have increased and expanded across the country over the years. The space-time semivariogram models characterising the spatio-temporal continuity of the incidence rates indicated that the occurrence of the malaria morbidity was spatially and temporally correlated within spatial and temporal ranges varying between 30 and 250 km and 6 and 100 months, respectively. The predicted incidence rates were found to be heterogeneous with highly elevated risk at locations near the borders with neighbouring countries in the north and west as well as the central parts towards the east. The spatial maps showed transition of high risk areas from the north-west to the north-east parts with climatic variables contributing to the variations in the number of morbidity cases across the country. The morbidity incidence estimates were found to be higher during the wet season when temperatures were relatively low whilst low incidence rates were observed in the warm weather period during the dry seasons. In conclusion, the study quantified the malaria morbidity burden in Ghana to produce evidence-based monthly morbidity maps, illustrating the risk patterns of the morbidity of the disease. Increased morbidity risk, delimiting the highest risk areas was also established. This statistical-based modelling approach is important as it allows shortterm prediction of the malaria morbidity incidence in specific regions and districts and also helps support efficient public health planning in the country

    Comparison and application of three decorrelation methods PCA, MAF and ACDC

    Get PDF
    Geostatistics is a branch of applied mathematics that deals with spatially correlated data. Analysing and modelling spatially correlated data can be difficult and time consuming, especially for a multivariate data set. One of the techniques used to make analysis and modelling easier involves decorrelation, whereby a linear transformation on the sample variables is used to associate the spatially correlated variables with a set of decorrelated factors which are statistically and spatially independent. PCA was one of the first multivariate techniques and is mostly used as a data reduction technique. A popular alternative decorrelation technique often used in the mining industry is MAF. A study conducted by Bandarian (2008) found a relatively new decorrelation technique known as ACDC to be the method which produced the best spatial decorrelation for a multivariate moderately correlated data set consisting of four variables. In this thesis the PCA, MAF and ACDC methods are described and then applied to a multivariate data set supplied by Rio Tinto\u27s Iron Ore Operations. Secondly, we explore whether it is preferable for the data set to be standardised or transformed via Gaussian anamorphosis to normal scores before being decorrelated. The data set consists of ten variables; however the three decorrelation methods were only applied to a subset of five variables (Fe, Ah03, Si02, LOI and Ti02) which have the greatest similarity from a statistical and spatial point of view. The three methods were applied to both standardised and normalised data. For ACDC, additional inputs such as weights, number of iterations, tolerance and an initial guess for the diagonalising matrix were explored and investigated in order to get the best spatial decorrelation results possible. The overall best spatial decorrelation was achieved by performing ACDC on the standardised variables, using the matrix of eigenvectors of the correlation matrix as the initial guess for the diagonalising matrix as well as the first four experimental semivariogram matrices in the decorrelation. Transforming the variables to normal scores before decorrelation was found to be of no benefit, as the factors that were derived from the normalised variables with the exception of one, were not normally distributed following the decorrelation

    Geostatistical Models for Exposure Estimation in Environmental Epidemiology.

    Get PDF
    Studies investigating associations between health outcomes and exposure to environmental pollutants benefit from measures of exposure made at the individual level. In this thesis we consider geostatistical modelling strategies aimed at providing such individual-level estimates. We present three papers showing how to adapt the standard univariate stationary Gaussian geostatistical model according to the nature of the exposure under consideration. In the first paper, we show how informative spatio-temporal covariates can be used to simplify the correlation structure of the assumed Gaussian process. We apply the method to data from a historical cohort study in Newcastle-upon-Tyne, designed to investigate links between adverse birth outcomes and maternal exposure to black smoke, measured by a fixed network of monitoring stations throughout a 32-year period. In the second paper, we show how predictions in the stationary Gaussian model change when the data and prediction locations cannot be measured precisely, and are therefore subject to positional error. We demonstrate that ignoring positional error results in biased predictions with misleading prediction errors. In the third paper, we consider models for multivariate exposures, concentrating on the bivariate case. We review and compare existing modelling strategies for bivariate geostatistical data and fit a common component model to a data-set of radon measurements from a case-control study designed to investigate associations with lung cancer in Winnipeg, Canada

    Geostatistical spatiotemporal modelling with application to the western king prawn of the Shark Bay managed prawn fishery

    Get PDF
    Geostatistical methodology has been employed in the modelling of spatiotemporal data from various scientific fields by viewing the data as realisations of space-time random functions. Traditional geostatistics aims to model the spatial variability of a process so, in order to incorporate a time dimension into a geostatistical model, the fundamental differences between the space and time dimensions must be acknowledged and addressed. The main conceptual viewpoint of geostatistical spatiotemporal modelling identified within the literature views the process as a single random function model utilising a joint space-time covariance function to model the spatiotemporal continuity. Geostatistical space-time modelling has been primarily data driven, resulting in models that are suited to the data under investigation, usually survey data involving fixed locations. Space-time geostatistical modelling of fish stocks within the fishing season is limited as the collection of fishery-independent survey data for the spatiotemporal sampling design is often costly or impractical. However, fishery-dependent commercial catch and effort data, throughout each season, are available for many fisheries as part of the ongoing monitoring program to support their stock assessment and fishery management. An example of such data is prawn catch and effort data from the Shark Bay managed prawn fishery in Western Australia. The data are densely informed in both the spatial and temporal dimensions and cover a range of locations at each time instant. Both catch and effort variables display an obvious spatiotemporal continuity across the fishing region and throughout the season. There is detailed spatial and temporal resolution as skippers record their daily fishing shots with associated latitudinal and longitudinal positions. In order to facilitate the ongoing management of the fishery, an understanding of the spatiotemporal dynamics of various prawn species within season is necessary. A suitable spatiotemporal model is required in order to effectively capture the joint space-time dependence of the prawn data. An exhaustive literature search suggests that this is the first application of geostatistical space-time modelling to commercial fishery data, with the development and evaluation of an integrated space-time geostatistical model that caters for the commercial logbook prawn catch and effort data for the Shark Bay fishery. The model developed in this study utilises the global temporal trend observed in the data to standardise the catch rates. Geostatistical spatiotemporal variogram modelling was shown to accurately represent the spatiotemporal continuity of the catch data, and was used to predict and simulate catch rates at unsampled locations and future time instants in a season. In addition, fishery-independent survey data were used to help improve the performance of catch rate estimates

    On the spatial modelling of mixed and constrained geospatial data

    Get PDF
    Spatial uncertainty modelling and prediction of a set of regionalized dependent variables from various sample spaces (e.g. continuous and categorical) is a common challenge for geoscience modellers and many geoscience applications such as evaluation of mineral resources, characterization of oil reservoirs or hydrology of groundwater. To consider the complex statistical and spatial relationships, categorical data such as rock types, soil types, alteration units, and continental crustal blocks should be modelled jointly with other continuous attributes (e.g. porosity, permeability, seismic velocity, mineral and geochemical compositions or pollutant concentration). These multivariate geospatial data normally have complex statistical and spatial relationships which should be honoured in the predicted models. Continuous variables in the form of percentages, proportions, frequencies, and concentrations are compositional which means they are non-negative values representing some parts of a whole. Such data carry just relative information and the constant sum constraint forces at least one covariance to be negative and induces spurious statistical and spatial correlations. As a result, classical (geo)statistical techniques should not be implemented on the original compositional data. Several geostatistical techniques have been developed recently for the spatial modelling of compositional data. However, few of these consider the joint statistical and/or spatial relationships of regionalized compositional data with the other dependent categorical information. This PhD thesis explores and introduces approaches to spatial modelling of regionalized compositional and categorical data. The first proposed approach is in the multiple-point geostatistics framework, where the direct sampling algorithm is developed for joint simulation of compositional and categorical data. The second proposed method is based on two-point geostatistics and is useful for the situation where a large and representative training image is not available or difficult to build. Approaches to geostatistical simulation of regionalized compositions consisting of several populations are explored and investigated. The multi-population characteristic is usually related to a dependent categorical variable (e.g. rock type, soil type, and land use). Finally, a hybrid predictive model based on the advanced geostatistical simulation techniques for compositional data and machine learning is introduced. Such a hybrid model has the ability to rank and select features internally, which is useful for geoscience process discovery analysis. The proposed techniques were evaluated via several case studies and results supported their usefulness and applicability

    Compositions, logratios and geostatistics: An application to iron ore

    Get PDF
    Common implementations of geostatistical methods, kriging and simulation, ignore the fact that geochemical data are usually reported in weight percent, sum to a constant, and are thus compositional in nature. The constant sum implies that rescaling has occurred and this can be shown to produce spurious correlations. Compositional geostatistics is an approach developed to ensure that the constant sum constraint is respected in estimation while removing dependencies on the spurious correlations. This study tests the applicability of this method against the commonly implemented ordinary cokriging method. The sample data are production blast cuttings analyses drawn from a producing iron ore mine in Western Australia. Previous studies using the high spatial density blast hole data and compositional geostatistical approach returned encouraging results, results other practitioners suggested were due to the high spatial density. This assertion is tested through sub-sampling of the initial data to create four subsets of successively lower spatial densities representing densities, spacings, and orientations typical of the different stages of mine development. The same compositional geostatistical approach was then applied to the subsets using jack-knifing to produce estimates at the removed data locations. Although other compositional geostatistical solutions are available, the additive logratio (alr) approach used in this study is the simplest to implement using commercially available software. The advantages of the logratio methodology are the removal of the constant sum constraint, allowing the resulting quantities to range freely within the real space and, importantly, the use of many proven statistical and geostatistical methods. The back transformation of linear combinations of these quantities and associated estimation variances to the constrained sample space is known to be biased; this study used numerical integration by Gauss-Hermite quadrature to overcome this drawback. The Aitchison and Euclidean distances were used to quantify both the univariate and compositional errors between the estimates and original sample values from each estimation method. The errors of each method are analysed using common descriptive and graphical criteria including the standardised residual sum of squares and an assessment of the accuracy and precision. The highest spatial density dataset is equally well reproduced by either method. The compositional method is generally more accurate and precise than the conventional method. In general the compositional error analyses favour the compositional techniques, producing more geologically plausible results, and which sum to the required value. The results support the application of the logratio compositional methodology to low spatial density data over the commonly implemented ordinary cokriging

    Spatial verification and validation of datasets in fluid dynamics

    Full text link
    There is a present need for post-processing tools capable of synthesising and interpreting the numerous spatial data that are typically generated in modern investigations of fluid mechanics. Recent advances have provided both the analyst and the experimentalist with powerful tools for resolving complete flow-field information, using Computational Fluid Dynamics to simulate the flow, or noninvasive flow metrology such as Particle Image Velocimetry and Laser Doppler Anemometry. A great deal of nodal data is generated by these techniques, which quantity may be expected to increase into the future. This data comprises uncertainties in both numerical modelling and experimental measurement, which traditionally have been quantified using classical approaches in Verification and Validation. However, these techniques were designed with summary scalar values in mind and generally overlook or underestimate the importance of suitable spatial and topological description of the flow-field. The author uses established techniques in geostatistics to address the fluids data assimilation problem, and cross-correlate spatial field variables collected over an experimental domain with field variables calculated by a numerical model that simulates this domain. Spatial statistics are generated on the inter-related nodal data, and are used to inform a stationary covariance model describing the datasets as a particular realisation of a random process. This model is used to provide statistics quantifying the correlation of complete experimental and numerical flow-fields, and make better estimations of local field values taking into account the sum data that is available to the practitioner. Special consideration is given to the application of the random function model to a calculated flow-field, in which errors are not aleatoric but epistemic, and comprise unknown chaotic processes and higher-order error terms. The kriging estimator was useful for the characterisation of the spatial datasets considered, and may be expected to extend quite generally to other fluids problems. In particular, meaningful blending of experimental and numerical data was achieved by cokriging, and is demonstrated in situations where experimental data is missing or sparse but may be inferred by the secondary numerical data with which it is well correlated. A statistic describing whole-field correlation on the basis of functional covariance was also proposed for fluids problems, with reference to which it is demonstrated that traditional pointwise measures of disparity are inadequate for spatial problems

    Non-linear univariate and multivariate spatial modelling and optimal design

    Get PDF
    This thesis developed a novel adaptive methodology for the optimal design of additional sampling based on a geostatistical model that can preserve both multivariate non-linearity and spatial non-linearity present in spatial variables. This methodology can be applied in mining or any other field that deals with spatial data. The results from the different environment case studies demonstrated the potential of the proposed design methodology

    A geostatistical simulation algorithm for the homogenisation of climatic time series: a contribution to the homogenisation of monthly precipitation series

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsAs defined by the Intergovernmental Panel on Climate Change (IPCC), climate change refers to a change in the state of the climate that can be identified by changes in the statistical characteristics of its properties and that persists for an extended period, typically decades or longer. In order to assess climate change and to develop impact studies, it is imperative that climate signals are clean from any external factors. However, non-natural irregularities are an inevitable part of long-time climate records. They are introduced during the process of measuring and collecting data from weather stations. Accordingly, it is essential to detect and correct those irregularities a priori, through a process called homogenisation. This process became a hot topic in the last decades and many researchers have focused on developing efficient methods. Still, some climatic variables are lacking homogenisation procedures due to their high variability and temporal resolution (e.g., monthly precipitation). We propose the gsimcli (Geostatistical SIMulation for the homogenisation of CLImate data) homogenisation method, which is based on a geostatistical simulation method, namely the direct sequential simulation. The proposed approach considers simulated values of the candidate station’s neighbouring area, defined by the local radius parameter, aiming to account for local characteristics of its climatic zone. gsimcli has other modelling parameters, such as the candidates order in the homogenisation process, the detection parameter, and the correction parameter (also used to fill in missing data). A semi-automatic version of gsimcli is also proposed, where the homogenisation adjustments can be estimated from a comparison series. The efficiency of the gsimcli method is evaluated in the homogenisation of precipitation data. Several homogenisation exercises are presented in a sensitivity analysis of the parameters for two different data sets: real and artificial precipitation data. The assessment of the detection part of gsimcli is based on the comparison with other detection techniques using real data, and extends a previous study for the south of Portugal. Artificial monthly and annual data from a benchmark data set of the HOME project (ACTION COST-ES0601) is used to assess the performance of gsimcli. These results allow the comparison between gsimcli and state-of-the-art methods through the calculation of performance metrics. This research allowed identifying gsimcli parameters that have a high influence in the homogenisation results: correction parameter, grid cell size and local radius parameter. The set of parameters providing the best values of performance metrics are recommended as the most suitable set of homogenisation parameters for monthly precipitation data. Results show gsimcli as a favourable homogenisation method for monthly precipitation data that outperformed a few well established procedures. The filling in of missing data is an advantage when compared to other methods. Taking advantage of its capability of filtering irregularities and providing comparison series, gsimcli can also be used as a pre-homogenisation tool followed by the use of a traditional homogenisation method (semi-automatic approach). As future work, it is recommended the performance assessment of the gsimcli method with denser monitoring networks, and the inclusion of a multivariate geostatistical simulation algorithm in the homogenisation procedure.As alterações climáticas, tal como definidas pelo Painel Intergovernamental para as Alterações Climáticas das Nações Unidas, referem-se a uma modificação no estado do clima que pode ser identificada através de alterações nas suas propriedades estatísticas e que perdura por um largo período de tempo, tipicamente décadas ou períodos mais longos. Para a avaliação das alterações climáticas, e para o desenvolvimento de estudos de impacte, é imperativo que os sinais climáticos estejam isentos de quaisquer fatores externos. Inevitavelmente, as séries temporais de dados climáticos contêm irregularidades não-naturais. Tais irregularidades são introduzidas durante o processo de medição e recolha de dados nas estações meteorológicas. Assim, é essencial a prévia deteção e correção dessas irregularidades, através de um processo chamado homogeneização. Nas últimas décadas, este processo tornou-se um tópico relevante e muitos investigadores procuraram desenvolver métodos de homogeneização eficientes. Contudo, existe um número reduzido de métodos para algumas variáveis climáticas devido à sua elevada variabilidade e resolução temporal (e.g., precipitação mensal). Neste trabalho propomos o método de homogeneização gsimcli (Geostatistical SIMulation for the homogenisation of CLImate data), o qual se baseia num método de simulação geoestatística, a simulação sequencial direta. A abordagem proposta tem em consideração valores simulados na vizinhança da estação candidata, definida pelo parâmetro raio local, com o objetivo de incorporar características locais da sua zona climática. O gsimcli tem outros parâmetros de modelação, tais como a ordem das estações candidatas no processo de homogeneização, o parâmetro de deteção e o parâmetro de correção (também usado na substituição de observações omissas). Propõe-se também uma abordagem semi-automática do gsimcli onde os ajustamentos para a correção de irregularidades podem ser estimados a partir de uma série de comparação. A eficiência do método gsimcli é avaliada na homogeneização de dados de precipitação. São apresentados vários exercícios de homogeneização numa análise de sensibilidade dos parâmetros para dois conjuntos de dados: dados reais e artificiais de precipitação. A avaliação da componente de deteção do gsimcli baseia-se na comparação com outras técnicas de deteção de irregularidades utilizando dados reais, e constitui uma extensão de um estudo anterior para o sul de Portugal. O desempenho do método gsimcli é avaliado a partir de dados artificiais (mensais e anuais) de um conjunto de dados de referência (benchmark) do projeto HOME (ACTION COST-ES0601). Estes resultados permitem a comparação do gsimcli com métodos que se constituem como o estado-da-arte neste domínio, a partir do cálculo de métricas de desempenho. Este estudo permitiu identificar os parâmetros do gsimcli que mais influenciam os resultados da homogeneização: parâmetro de correção, o tamanho da célula e o raio local. O conjunto de parâmetros com os melhores resultados das métricas de desempenho é recomendado como sendo o mais adequado à homogeneização da precipitação mensal. Os resultados mostram que o gsimcli tem um contributo positivo na homogeneização da precipitação mensal, tendo superado o desempenho de alguns métodos de homogeneização bem estabelecidos. A sua capacidade para substituir valores omissos é uma vantagem em relação a outros métodos. Tirando partido da sua capacidade para filtrar irregularidades e para disponibilizar séries de comparação, o gsimcli também pode ser usado como uma ferramenta de pré-homogeneização, seguindo-se a aplicação de um método tradicional de homogeneização (abordagem semi-automática). Como trabalhos futuros, recomenda-se a avaliação de desempenho do método gsimcli com redes meteorológicas mais densas, e a inclusão de um algoritmo de simulação geoestatística multivariada no procedimento de homogeneização
    corecore