24 research outputs found
Penalized composite link mixed models for two-dimensional count data
Mortality data provide valuable information for the study of the spatial distribution of mortality risk, in disciplines such as spatial epidemiology, medical demography, and public health. However, they are often available in an aggregated form over irregular geographical units, hindering the visualization of the underlying mortality risk and the detection of meaningful patterns. Also, it could be of interest to obtain mortality risk estimates on a finer spatial resolution, such that they can be linked with potential risk factors — in a posterior correlation analysis — that are usually measured in a different spatial resolution than mortality data. In this paper, we propose the use of the penalized composite link model and its representation as a mixed model to deal with these issues. This model takes into account the nature of mortality rates by incorporating the population size at the finest resolution, and allows the creation of mortality maps at a desirable scale, reducing the visual bias resulting from the spatial aggregation within original units. We illustrate our proposal with the analysis of several datasets related with deaths by respiratory diseases, cardiovascular diseases, and lung cancer.MTM2011-28285-C02-02
MTM2014-52184-
Modeling latent spatio-temporal disease incidence using penalized composite link models
Epidemiological data are frequently recorded at coarse spatio-temporal resolutions to protect confidential information or to summarize it in a compact manner. However, the detailed patterns followed by the source data, which may be of interest to researchers and public health officials, are overlooked. We propose to use the penalized composite link model (Eilers PCH (2007)), combined with spatio-temporal P-splines methodology (Lee D.-J., Durban M (2011)) to estimate the underlying trend within data that have been aggregated not only in space, but also in time. Model estimation is carried out within a generalized linear mixed model framework, and sophisticated algorithms are used to speed up computations that otherwise would be unfeasible. The model is then used to analyze data obtained during the largest outbreak of Q-fever in the Netherlands.Grant No. MTM2014-52184-P awarded to MD, and DA, and by Agencia Estatal de Investigació
Penalized composite link models for aggregated spatial count data: a mixed model approach
Mortality data provide valuable information for the study of the spatial distri- bution of mortality risk, in disciplines such as spatial epidemiology and public health. However, they are frequently available in an aggregated form over irreg- ular geographical units, hindering the visualization of the underlying mortality risk. Also, it can be of interest to obtain mortality risk estimates on a finer spatial resolution, such that they can be linked to potential risk factors that are usually measured in a different spatial resolution. In this paper, we propose the use of the penalized composite link model and its mixed model representation. This model considers the nature of mortality rates by incorporating the population size at the finest resolution, and allows the creation of mortality maps at a finer scale, thus reducing the visual bias resulting from the spatial aggrega- tion within original units. We also extend the model by considering individual random effects at the aggregated scale, in order to take into account the overdis- persion. We illustrate our novel proposal using two datasets: female deaths by lung cancer in Indiana, USA, and male lip cancer incidence in Scotland counties. We also compare the performance of our proposal with the area-to-point Poisson kriging approach
Modelling latent trends from spatio-temporally grouped data using composite link mixed models
Epidemiological data are frequently recorded at coarse spatio-temporal resolutions. The aggregation process is done for several reasons: to protect confidential patients' information, to compare with other datasets at a coarser resolution than the original, or to summarize data in a compact manner. However, we lose detailed patterns that follow the original data, which can be of interest for researchers and public health officials. In this paper we propose the use of the penalized composite link model (Eilers, 2007), together with its mixed model representation, to estimate the underlying trend behind grouped data at a finer spatio-temporal resolution. Also, this model allows the incorporation of fine-scale population into the estimation procedure. We assume the underlying trend is smooth across space and time. The mixed model representation enables the use of sophisticated algorithms such as the SAP algorithm of Rodríguez- Álvarez et al. (2015) for fast estimation of the amount of smoothness. We illustrate our proposal with the analysis of data obtained during the largest outbreak of Q fever in the Netherlands.MTM2011-28285-C02-02, MTM2014-52184-
CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH
Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance