36 research outputs found

    Modelling mortality dynamics in heterogeneous human populations

    Get PDF
    The mortality patterns of human populations reflect several inherent biological attributes and other external factors including social, medical and environmental conditions. Mathematical modelling, in addition to experiments and simulations, is an important tool for the analysis of those patterns. One of the main observed characteristics of mortality patterns in human populations is the age-specific increase in mortality rate after sexual maturity. This increase is predominantly exponential and satisfies the well-known Gompertz law of mortality. Although the exponential growth in mortality rate is observed over a wide range of ages, it excludes early- and late-life intervals. The heterogeneity of human populations is a common consideration in describing and validating their various age-related features. In this study we develop a mathematical model that combines (i) the assumption of heterogeneity within each human population, where different subpopulations are distinguished for their certain mortality dynamics and (ii) the assumption that the mortality of each constituent subpopulation increases exponentially with age (in the same way as described by the Gompertz law). The proposed model is used to fit available observational data in order to analyse the dynamics of mortality across the lifespan and the evolution of mortality patterns over time. We first explore the effects of the variation of the model parameters to the dynamics of mortality and use the model to fit actual age-specific mortality data. We show that the model successfully reproduces the entire age-dependent mortality patterns explaining the peculiarities of mortality at young and very old ages. In particular, we show that the mortality data on Swedish populations can be reproduced fairly well by a model comprising of four subpopulations. Besides the confirmation that heterogeneity can explain the irregularities of mortality patterns at young ages and the deceleration of mortality at extremely old ages, we analyse the influence of stochastic effects on mortality and we conclude that evident effects due to stochasticity are manifested at the age intervals (early and late life ages) where only few individuals contribute to mortality. We then analyse the evolution of mortality patterns over time by fitting the proposed model to (Swedish) mortality data of consecutive periods across the 20th century. The evolution of mortality is described in terms of the changes of model parameters estimated by fitting the model to data from different time periods. We show that the evolution of model parameters confirms the applicability of the compensation law of mortality to each constituent subpopulation separately. The compensation law states an inverse relationship between the scale and the shape parameter of Gompertz law. Our analysis also indicates a change in the structure of this population over time in a way that the population tends to become more homogeneous by the end of the 20th century. This change in structure is reflected in changes to the initial proportions of the constituent subpopulations. These two observations, namely the validity of the compensation effect and the homogenisation of the population, imply that the alteration of model parameters (which reflect demographic terms) can explain the decrease of the overall mortality over time. It is shown that the decrease in mortality across the 20th century is mainly due to changes in the structure of the population, and to a lesser extent, to a reduction in mortality for each of the subpopulations. The outcomes of our research show that the consideration of heterogeneity is efficient for the description of various features of a population’s mortality. The idea of “pure” subpopulations, such that in each of them exponential law is held for all ages, has been used as a convenient mathematical constraint which allows very accurate reproduction of the entire mortality patterns. This provides a justification for the deviation of mortality from its exponential increase at young and very-old ages and for the decrease of mortality over time. In the last part of this thesis we propose that the proposed heterogeneity is not only a convenient tool for fitting mortality data but indeed reflects the true heterogeneous structure of the population. Particularly we demonstrate that the model of a heterogeneous population fits mortality data better than most of the other commonly used models if the data are taken for the entire lifespan and better than all other models if we consider only old ages. Also, we show that the model can reproduce seemingly contradicting observations in late-life mortality dynamics like deceleration, levelling-off and mortality decline. Finally, assuming that the differences between subpopulations reflect genetic variations within the population and using the Swedish mortality data for the 20th century, we show that evolutionary processes resulting in changes of allele frequencies, can explain the homogenisation of the population as predicted by the model

    A deterministic approach for protecting privacy in sensitive personal data

    Get PDF
    BackgroundData privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts overcome this challenge.MethodsIn this paper, we propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables through a k nearest neighbours based algorithm.ResultsWe demonstrate the use of the deterministic anonymisation on real data, including data from a sample of Titanic passengers, and data from participants in the 1958 Birth Cohort.ConclusionsThe proposed procedure makes data re-identification difficult while minimising the loss of utility (by preserving the spatial properties of the underlying data); the latter means that informative statistical analysis can still be conducted

    Privacy protected text analysis in DataSHIELD

    Get PDF
    ABSTRACT Objectives DataSHIELD (www.datashield.ac.uk) was born of the requirement in the biomedical and social sciences to co-analyse individual patient data (microdata) from different sources, without disclosing identity or sensitive information. Under DataSHIELD, raw data never leaves the data provider and no microdata or disclosive information can be seen by the researcher. The analysis is taken to the data - not the data to the analysis. Text data can be very disclosive in the biomedical domain (patient records, GP letters etc). Similar, but different, issues are present in other domains - text could be copyrighted, or have a large IP value, making sharing impractical. Approach By treating text in an analogous way to individual patient data we assessed if DataSHIELD could be adapted and implemented for text analysis, and circumvent the key obstacles that currently prevent it. Results Using open digitised text data held by the British Library, a DataSHIELD proof-of-concept infrastructure and prototype DataSHIELD functions for free text analysis were developed. Conclusions Whilst it is possible to analyse free text within a DataSHIELD infrastructure, the challenge is creating generalised and resilient anti-disclosure methods for free text analysis. There are a range of biomedical and health sciences applications for DataSHIELD methods of privacy protected analysis of free text including analysis of electronic health records and analysis of qualitative data e.g. from social media

    Universal statistics of epithelial tissue topology

    Get PDF
    Cells forming various epithelial tissues have a strikingly universal distribution for the number of their edges. It is generally assumed that this topological feature is predefined by the statistics of individual cell divisions in growing tissue but existing theoretical models are unable to predict the observed distribution. Here we show experimentally, as well as in simulations, that the probability of cellular division increases exponentially with the number of edges of the dividing cell and show analytically that this is responsible for the observed shape of cell-edge distribution

    Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

    Get PDF
    Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown)

    Sleep duration in preschool age and later behavioral and cognitive outcomes:an individual participant data meta-analysis in five European cohorts

    Get PDF
    Data de publicació electrònica: 07-02-2023Short sleep duration has been linked to adverse behavioral and cognitive outcomes in schoolchildren, but few studies examined this relation in preschoolers. We aimed to investigate the association between parent-reported sleep duration at 3.5 years and behavioral and cognitive outcomes at 5 years in European children. We used harmonized data from five cohorts of the European Union Child Cohort Network: ALSPAC, SWS (UK); EDEN, ELFE (France); INMA (Spain). Associations were estimated through DataSHIELD using adjusted generalized linear regression models fitted separately for each cohort and pooled with random-effects meta-analysis. Behavior was measured with the Strengths and Difficulties Questionnaire. Language and non-verbal intelligence were assessed by the Wechsler Preschool and Primary Scale of Intelligence or the McCarthy Scales of Children's Abilities. Behavioral and cognitive analyses included 11,920 and 2981 children, respectively (34.0%/13.4% of the original sample). In meta-analysis, longer mean sleep duration per day at 3.5 years was associated with lower mean internalizing and externalizing behavior percentile scores at 5 years (adjusted mean difference: - 1.27, 95% CI [- 2.22, - 0.32] / - 2.39, 95% CI [- 3.04, - 1.75]). Sleep duration and language or non-verbal intelligence showed trends of inverse associations, however, with imprecise estimates (adjusted mean difference: - 0.28, 95% CI [- 0.83, 0.27] / - 0.42, 95% CI [- 0.99, 0.15]). This individual participant data meta-analysis suggests that longer sleep duration in preschool age may be important for children's later behavior and highlight the need for larger samples for robust analyses of cognitive outcomes. Findings could be influenced by confounding or reverse causality and require replication.Open Access funding enabled and organized by Projekt DEAL. This research (LifeCycle Project ID: ECCNLC201914) was funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement N: 733206, LifeCycle project. Kathrin Guerlich was granted a LifeCycle Fellowship (Grant Agreement N: 733206, LifeCycle project). Berthold Koletzko is the Else Kröner Seniorprofessor of Paediatrics at LMU – University of Munich, financially supported by Else Kröner-Fresenius-Foundation, LMU Medical Faculty and LMU University Hospital. Deborah A Lawlor and Ahmed Elhakeem work in a Unit that receives support from the University of Bristol and UK Medical Research Council (MC_UU_00011/6). Deborah A Lawlor is a British Heart Foundation Chair (CH/F/20/90003) and a National Institute of Health Research Senior Investigator (NF-0616–10102). Mònica Guxens is funded by a Miguel Servet II fellowship (CPII18/00018) awarded by the Spanish Institute of Health Carlos III. Jordi Julvez holds Miguel Servet-II contract (CPII19/00015) awarded by the Instituto de Salud Carlos III (Co-funded by European Social Fund "Investing in your future"). Tim Cadman was funded a Marie Sklodowska-Curie Individual Fellowship. Funding details for each cohort are provided in Online Resource 1. No funder had any influence on the study design, data collection, statistical analyses or interpretation of findings. The views expressed in this paper are those of the authors and not necessarily of any funders

    DataSHIELD – new directions and dimensions

    Get PDF
    In disciplines such as biomedicine and social sciences, sharing and combining sensitive individual-level data is often prohibited by ethical-legal or governance constraints and other barriers such as the control of intellectual property or the huge sample sizes. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual-levEL Databases) is a distributed approach that allows the analysis of sensitive individual-level data from one study, and the co-analysis of such data from several studies simultaneously without physically pooling them or disclosing any data. Following initial proof of principle, a stable DataSHIELD platform has now been implemented in a number of epidemiological consortia. This paper reports three new applications of DataSHIELD including application to post-publication sensitive data analysis, text data analysis and privacy protected data visualisation. Expansion of DataSHIELD analytic functionality and application to additional data types demonstrate the broad applications of the software beyond biomedical sciences

    Exposure to natural environments during pregnancy and birth outcomes in 11 European birth cohorts

    Get PDF
    Research suggests that maternal exposure to natural environments (i.e., green and blue spaces) promotes healthy fetal growth. However, the available evidence is heterogeneous across regions, with very few studies on the effects of blue spaces. This study evaluated associations between maternal exposure to natural environments and birth outcomes in 11 birth cohorts across nine European countries. This study, part of the LifeCycle project, was based on a total sample size of 69,683 newborns with harmonised data. For each participant, we calculated seven indicators of residential exposure to natural environments: surrounding greenspace in 100m, 300m, and 500m using Normalised Difference Vegetation Index (NDVI) buffers, distance to the nearest green space, accessibility to green space, distance to the nearest blue space, and accessibility to blue space. Measures of birth weight and small for gestational age (SGA) were extracted from hospital records. We used pooled linear and logistic regression models to estimate associations between exposure to the natural environment and birth outcomes, controlling for the relevant covariates. We evaluated the potential effect modification by socioeconomic status (SES) and region of Europe and the influence of ambient air pollution on the associations. In the pooled analyses, residential surrounding greenspace in 100m, 300m, and 500m buffer was associated with increased birth weight and lower odds for SGA. Higher residential distance to green space was associated with lower birth weight and higher odds for SGA. We observed close to null associations for accessibility to green space and exposure to blue space. We found stronger estimated magnitudes for those participants with lower educational levels, from more deprived areas, and living in the northern European region. Our associations did not change notably after adjustment for air pollution. These findings may support implementing policies to promote natural environments in our cities, starting in more deprived areas

    Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    No full text
    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared
    corecore