17 research outputs found

    From online banking to biobanking: designing and implementing a data delivery platform for researchers of the China Kadoorie Biobank

    Get PDF
    Introduction As a large prospective cohort study of 512,891 participants, we routinely integrate data via linkage to population health outcomes to deliver up-to-date research datasets. Building on experience from the private sector, we have implemented a platform that supports the delivery and audit of secure datasets to internal and external researchers. Objectives and Approach We aimed to create a platform that delivers secure research datasets for preliminary analyses and fieldwork with dynamic censor dates. It must also provide multiple static versions of an analysis-ready database with fixed censor dates. Individual participant outcome data from population health sources (death and disease registries and health insurance agencies) can be integrated and linked regularly to other health-related data, e.g., genetic, bioinformatics, and medical device. With knowledge of data management strategies used in major financial institutions, we have produced systems that successfully implement the techniques of data warehousing, multiple concurrent environments and secure dataset access and delivery. Results As at the end of 2017, we had over 300 registered and approved researchers eligible to request datasets. Using our platform, we have recorded over 150 requests and successfully delivered over 100 de-personalised and encrypted datasets to external researchers around the world. In addition, we have supplied secure datasets to over 20 Global Health MSc and DPhil students studying at The University. The platform currently hosts 4 completed analysis-ready databases with censor dates ranging from 31st December 2013 to 11 years of follow-up as of 31st December 2016. A 5th analysis-ready database (with the most recent outcome data from participants’ death, disease and hospitalisations) is already under development. We plan during 2018 to make available more data sources and outcome data to our external researchers. Conclusion/Implications We have developed a versatile platform that delivers secure datasets for researchers from a selection of analysis-ready databases, each with differing censor dates and available data sources. This platform is scalable and can accommodate regular integration of known follow-up data sources along with new and emerging data sources

    Automatic coding of nearly 2 million hospitalisation events to ICD-10 in the China Kadoorie Biobank

    Get PDF
    Introduction Using linkage to the Chinese National Health Insurance (HI) system, we identified disease outcomes from a prospective cohort study of 512,000 middle-aged Chinese adults. Mandarin free-text diagnosis data were supplied by over 30 different agencies across 10 areas, often without an accompanying International Classification of Diseases 10th revision (ICD-10) code. Objectives and Approach To facilitate a genome-wide association study (GWAS) of all our genotyped participants, we needed to code as many of our 2.02 million hospitalisation events as possible. We developed software to assign ICD-10 codes to unique disease descriptions and stored the coded diagnoses in an internal corpus. The software used an interface which allowed clinicians to select and code disease descriptions individually, or collectively using Chinese keywords. All coded disease descriptions were subsequently validated by an independent Mandarin-speaking clinician. All new events with descriptions which matched exactly those already in the corpus were automatically coded to ICD-10. Results By the end of 2016, there were 2,021,352 hospitalisation events coded to ICD-10. 436,702 (21.6%) were automatically assigned codes where disease descriptions corresponded to those in the Chinese version of the ICD-10 codebook. A further 1,084,197 (53.6%) were coded by a clinician using our standardisation software; all disease descriptions linked to 200 or more events were included. Finally, a remaining 454,237 (22.5%) events were given the ICD-10 codes supplied by the health insurance agency (after cleaning). In total, 97.7% of all health insurance events were coded to ICD-10. Overall, over 17,000 unique disease descriptions have been clinically classified. Conclusion/Implications Automatic coding of hospitalisation events to ICD-10 has enabled our study to investigate a greater range of diseases and use GWAS to detect novel genetic variants. We are now well positioned to test semantic matching and machine learning strategies for coding of the remaining 46,216 (2.3%) uncoded events

    Vitamin D and cause-specific vascular disease and mortality:a Mendelian randomisation study involving 99,012 Chinese and 106,911 European adults

    Get PDF

    Validation and Vindication: Enhancing the quality of Electronic Health Record (EHR) outcomes in a large prospective study biobank

    No full text
    ABSTRACT Objectives To date we have derived and linked over 1.5 million electronic health records (EHR) to over 280,000 of our biobank participants (across 10 diverse urban and rural geographical regions of China) through established morbidity and mortality registries, and by linkage to the China national Health Insurance (HI) system to identify disease outcomes. To ensure accuracy, completeness and consistency of these outcomes we have developed data handling, processing, and validation procedures. Approach On a disease-by-disease basis we have developed multiple strategies varying from simple standardisation of disease description to ICD10, to a more comprehensive system of retrieval and examination of medical notes. For those diseases chosen for more precise phenotyping, such as stroke, we send trained research technicians to hospitals where they retrieve clinical documents that correspond to the disease outcomes as reported by registries and HI agencies. A Portable Validation Device (PVD) tablet has been developed to guide the medical notes gathering process and ‘validate’ chosen disease events, by photographing key documents and ascertaining disease subtypes. For diseases selected for further adjudication, medical specialists in China are given access to an internet based Case Adjudication System for clinical Events (i-CASE). The information captured by PVD is presented in i-CASE according to pre-specified clinical and procedural criteria. At least 5% of all of data undergoes quality control assessments at both validation and adjudication stages in the UK. Results Diseases causing the greatest burden in China (Cancer, Diabetes, COPD, Stroke and IHD) have all been standardised and subsequently validated using the PVD where appropriate. To date, we have retrieved over 45,000 hospital records through the PVD and, more recently, over 32,000 IHD and Stroke cases have been adjudicated through the i-CASE system. Outcomes are categorised as ‘confirmed’, ‘re-classified’, ‘sub-typed’ or ‘refuted’ and are integrated back into the study dataset for future analysis, along with additional data gathered from the notes. Conclusion The study provides a uniquely rich and powerful resource for investigating environmental and genetic determinants of chronic disease in the Chinese population. By treating disease types on a case-by-case basis we can both confirm their accuracy, and to link more precise clinical phenotypes to emerging data types, such as genetic and omics data. These innovations in disease validation and adjudication will enable the study to reach new insights into the aetiology of chronic disease to improve the health of the population in China and elsewhere

    Genetic and healthy lifestyle factors in relation to the incidence and prognosis of severe liver disease in the Chinese population

    No full text
    Abstract. Background:. Severe liver disease (SLD), including cirrhosis and liver cancer, constitutes a major disease burden in China. We aimed to examine the association of genetic and healthy lifestyle factors with the incidence and prognosis of SLD. Methods:. The study population included 504,009 participants from the prospective China Kadoorie Biobank aged 30-79 years. The individuals were from 10 diverse areas in China without a history of cancer or liver disease at baseline. Cox regression was used to estimate adjusted hazard ratios (HRs) for incident SLD and death after SLD diagnosis associated with healthy lifestyle factors (smoking, alcohol, physical activity, and central adiposity). Additionally, the contribution of genetic risk for hepatitis B virus (HBV, assessed by genetic variants in major histocompatibility complex, class II, DP/DQ [HLA-DP/DQ] genes) was also estimated. Results:. Compared with those with 0-1 healthy lifestyle factor, participants with 2, 3, and 4 factors had 12% (HR 0.88 [95% confidence interval [CI] 0.85, 0.92]), 26% (HR 0.74 [95%CI: 0.69, 0.79]), and 44% (HR 0.56 [95%CI: 0.48, 0.65]) lower risks of SLD, respectively. Inverse associations were observed among participants with both low and high genetic risks (HR per 1-point increase 0.83 [95%CI: 0.74, 0.94] and 0.91 [95%CI: 0.82, 1.02], respectively; Pinteraction = 0.51), although with a non-significant trend among those with a high genetic risk. Inverse associations were also observed between healthy lifestyle factors and liver biomarkers regardless of the genetic risk. Despite the limited power, healthy lifestyle factors were associated with a lower risk of death after incident SLD among participants with a low genetic risk (HR 0.59 [95%CI: 0.37, 0.96]). Conclusions:. Lifestyle modification may be beneficial in terms of lowering the risk of SLD regardless of the genetic risk. Moreover, it is also important for improving the prognosis of SLD in individuals with a low genetic risk. Future studies are warranted to examine the impact of healthy lifestyles on SLD prognosis, particularly among individuals with a high genetic risk

    Dairy consumption and risks of total and site-specific cancers in Chinese adults: an 11-year prospective study of 0.5 million people

    No full text
    Background Previous studies of primarily Western populations have reported contrasting associations of dairy consumption with certain cancers, including a positive association with prostate cancer and inverse associations with colorectal and premenopausal breast cancers. However, there are limited data from China where cancer rates and levels of dairy consumption differ importantly from those in Western populations. Methods The prospective China Kadoorie Biobank study recruited ~0.5 million adults from ten diverse (five urban, five rural) areas across China during 2004–2008. Consumption frequency of major food groups, including dairy products, was collected at baseline and subsequent resurveys, using a validated interviewer-administered laptop-based food frequency questionnaire. To quantify the linear association of dairy intake and cancer risk and to account for regression dilution bias, the mean usual consumption amount for each baseline group was estimated via combining the consumption level at both baseline and the second resurvey. During a mean follow-up of 10.8 (SD 2.0) years, 29,277 incident cancer cases were recorded among the 510,146 participants who were free of cancer at baseline. Cox regression analyses for incident cancers associated with usual dairy intake were stratified by age-at-risk, sex and region and adjusted for cancer family history, education, income, alcohol intake, smoking, physical activity, soy and fresh fruit intake, and body mass index. Results Overall, 20.4% of participants reported consuming dairy products (mainly milk) regularly (i.e. ≥1 day/week), with the estimated mean consumption of 80.8 g/day among regular consumers and of 37.9 g/day among all participants. There were significant positive associations of dairy consumption with risks of total and certain site-specific cancers, with adjusted HRs per 50 g/day usual consumption being 1.07 (95% CI 1.04–1.10), 1.12 (1.02–1.22), 1.19 (1.01–1.41) and 1.17 (1.07–1.29) for total cancer, liver cancer (n = 3191), female breast cancer (n = 2582) and lymphoma (n=915), respectively. However, the association with lymphoma was not statistically significant after correcting for multiple testing. No significant associations were observed for colorectal cancer (n = 3350, 1.08 [1.00–1.17]) or other site-specific cancers. Conclusion Among Chinese adults who had relatively lower dairy consumption than Western populations, higher dairy intake was associated with higher risks of liver cancer, female breast cancer and, possibly, lymphoma

    Association between fish consumption and risk of chronic obstructive pulmonary disease among Chinese men and women: an 11-year population-based cohort study

    No full text
    Background Epidemiological evidence on the relationship between fish consumption and chronic obstructive pulmonary disease (COPD) is limited, especially among Chinese. Objectives To explore the prospective association between fish consumption and COPD among a large population-based Chinese cohort. Methods The China Kadoorie Biobank (CKB) recruited over 0.5 million participants from ten geographically diverse regions across China from 2004 to 2008. Consumption frequency of fish at baseline was assessed by a validated food frequency questionnaire. 169,188 men and 252,238 women who had no prior COPD and other major chronic diseases at baseline were included in our analyses. Cox proportional hazard models were employed to estimate the hazard ratio (HR) and 95% confidence interval (CI) for fish consumption categories in relation to incident COPD. Results During a median follow-up of 11.1 years, 5542 incident COPD cases were documented. Fish consumption was inversely associated with COPD risk among women, with a 17% reduction in risk for participants who consumed fish ≥4 days/week compared with non-consumption (HR: 0.83; 95% CI: 0.70, 0.99; p for trend = 0.017), whereas we did not observe such a dose-response relationship among men (HR: 0.89; 95% CI: 0.76, 1.05; p for trend = 0.373). The joint analysis showed that COPD risk was 38% and 48% lower in men and women who consumed fish ≥4 days/week and had a healthy lifestyle (having ≥4 of the following healthy lifestyle factors: not smoking currently, never or rarely drinking alcohol, adequate physical activity, BMI 18.5-23.9 kg/m2, normal waist circumference, reasonable diet), compared with participants with fish consumption <4 days/week and unhealthy lifestyle (≤1 factors). Conclusion Higher fish consumption was associated with lower COPD risk among Chinese women but not men. Such association was independent of lifestyle factors. Eating adequate fish with an overall healthy lifestyle might help to lower the risk of COPD

    The Prospective Associations of Lipid Metabolism-Related Dietary Patterns with the Risk of Diabetes in Chinese Adults

    Get PDF
    Background: This study aimed to identify lipid metabolism-related dietary patterns with reduced rank regression (RRR) among Chinese adults and examine their associations with incident diabetes. Methods: We derived lipid metabolism-related dietary patterns using an RRR with 21 food groups as predictors as well as total cholesterol, low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, body mass index (BMI), and waist circumference from the responses of 17,318 participants from the second resurvey of the China Kadoorie Biobank (CKB). The dietary scores were calculated for the entire cohort. We followed up 479,207 participants for diabetes incidence from the baseline and used multivariable Cox regression models to estimate the hazard ratios (HRs) and 95% confidence intervals (CIs). Results: Two lipid metabolism-related dietary patterns were extracted. The dietary pattern—characterized by high intakes of fish, poultry, and other staples as well as fresh fruit and vegetables—was correlated with a higher BMI, waist circumference, and LDL cholesterol. Participants in the highest quintile (Q5) had a 44% increased risk of diabetes incidence when compared with those in the lowest quintile (Q1) (HR = 1.44; 95% CI: 1.31–1.59). Conclusions: A dietary pattern characterized by high intakes of both animal and plant foods was related to obesity and dyslipidemia and could increase the risk of diabetes incidence.This research was supported by the National Natural Science Foundation of China (81973125) and the National Key Research and Development Program of China (2016YFC0900500, 2016YFC0900501, 2016YFC0900504). The CKB baseline survey and the first resurvey were supported by a grant from the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up was supported by a grant (2016YFC1303904) from the National Key R&D Program of China, the National Natural Science Foundation of China (81390540, 81390541, 81390544), and the Chinese Ministry of Science and Technology (2011BAI09B01). The funders had no role in the study design, data collection, data analysis and interpretation, writing of the report, or the decision to submit the article for publication
    corecore