4 research outputs found

    Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach

    Get PDF
    ABSTRACT: BACKGROUND: Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error. RESULTS: Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1) a single best matching variable is found, (O2) more than one matching variable is found or (O3) not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration. CONCLUSION: Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when the numbers of datasets or variables to be included are larg

    Poverty and childhood undernutrition in developing countries : a multi-national cohort study

    No full text
    The importance of reducing childhood undernutrition has been enshrined in the United Nations’ Millennium Development Goals. This study explores the relationship between alternative indicators of poverty and childhood undernutrition in developing countries within the context of a multi-national cohort study (Young Lives). Approximately 2000 children in each of four countries – Ethiopia, India (Andhra Pradesh), Peru and Vietnam – had their heights measured and were weighed when they were aged between 6 and 17 months (survey one) and again between 4.5 and 5.5 years (survey two). The anthropometric outcomes of stunted, underweight and wasted were calculated using World Health Organization 2006 reference standards. Maximum-likelihood probit estimation was employed to model the relationship within each country and survey between alternative measures of living standards (principally a wealth index developed using principal components analysis) and each anthropometric outcome. An extensive set of covariates was incorporated into the models to remove as much individual heterogeneity as possible. The fully adjusted models revealed a negative and statistically significant coefficient on wealth for all outcomes in all countries, with the exception of the outcome of wasted in India (Andhra Pradesh) and Vietnam (survey one) and the outcome of underweight in Vietnam (surveys one and two). In survey one, the partial effects of wealth on the probabilities of stunting, being underweight and wasting was to reduce them by between 1.4 and 5.1 percentage points, 1.0 and 6.4 percentage points, and 0.3 and 4.5 percentage points, respectively, with each unit (10%) increase in wealth. The partial effects of wealth on the probabilities of anthropometric outcomes were larger in the survey two models. In both surveys, children residing in the lowest wealth quintile households had significantly increased probabilities of being stunted in all four study countries and of being underweight in Ethiopia, India (Andhra Pradesh) and Peru in comparison to children residing in the highest wealth quintile households. Random effects probit models confirmed the statistical significance of increased wealth in reducing the probability of being stunted and underweight across all four study countries. We conclude that, although multi-faceted, childhood undernutrition in developing countries is strongly rooted in poverty
    corecore