87 research outputs found

    Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

    Get PDF
    Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications

    An accurate test for homogeneity of odds ratios based on Cochran's Q-statistic

    Get PDF
    Background: A frequently used statistic for testing homogeneity in a meta-analysis of K independent studies is Cochran's Q. For a standard test of homogeneity the Q statistic is referred to a chi-square distribution with K - 1 degrees of freedom. For the situation in which the effects of the studies are logarithms of odds ratios, the chi-square distribution is much too conservative for moderate size studies, although it may be asymptotically correct as the individual studies become large. Methods: Using a mixture of theoretical results and simulations, we provide formulas to estimate the shape and scale parameters of a gamma distribution to t the distribution of Q. Results: Simulation studies show that the gamma distribution is a good approximation to the distribution for Q. Conclusions: : Use of the gamma distribution instead of the chi-square distribution for Q should eliminate inaccurate inferences in assessing homogeneity in a meta-analysis. (A computer program for implementing this test is provided.) This hypothesis test is competitive with the Breslow-Day test both in accuracy of level and in power

    The global burden of cancer attributable to risk factors, 2010-19: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF

    Population-level risks of alcohol consumption by amount, geography, age, sex, and year: a systematic analysis for the Global Burden of Disease Study 2020

    Get PDF
    Background The health risks associated with moderate alcohol consumption continue to be debated. Small amounts of alcohol might lower the risk of some health outcomes but increase the risk of others, suggesting that the overall risk depends, in part, on background disease rates, which vary by region, age, sex, and year. Methods For this analysis, we constructed burden-weighted dose–response relative risk curves across 22 health outcomes to estimate the theoretical minimum risk exposure level (TMREL) and non-drinker equivalence (NDE), the consumption level at which the health risk is equivalent to that of a non-drinker, using disease rates from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2020 for 21 regions, including 204 countries and territories, by 5-year age group, sex, and year for individuals aged 15–95 years and older from 1990 to 2020. Based on the NDE, we quantified the population consuming harmful amounts of alcohol. Findings The burden-weighted relative risk curves for alcohol use varied by region and age. Among individuals aged 15–39 years in 2020, the TMREL varied between 0 (95% uncertainty interval 0–0) and 0·603 (0·400–1·00) standard drinks per day, and the NDE varied between 0·002 (0–0) and 1·75 (0·698–4·30) standard drinks per day. Among individuals aged 40 years and older, the burden-weighted relative risk curve was J-shaped for all regions, with a 2020 TMREL that ranged from 0·114 (0–0·403) to 1·87 (0·500–3·30) standard drinks per day and an NDE that ranged between 0·193 (0–0·900) and 6·94 (3·40–8·30) standard drinks per day. Among individuals consuming harmful amounts of alcohol in 2020, 59·1% (54·3–65·4) were aged 15–39 years and 76·9% (73·0–81·3) were male. Interpretation There is strong evidence to support recommendations on alcohol consumption varying by age and location. Stronger interventions, particularly those tailored towards younger individuals, are needed to reduce the substantial global health loss attributable to alcohol. Funding Bill & Melinda Gates Foundation

    UNRAVEL: big data analytics research data platform to improve care of patients with cardiomyopathies using routine electronic health records and standardised biobanking

    Get PDF
    Introduction Despite major advances in our understanding of genetic cardiomyopathies, they remain the leading cause of premature sudden cardiac death and end-stage heart failure in persons under the age of 60 years. Integrated research databases based on a large number of patients may provide a scaffold for future research. Using routine electronic health records and standardised biobanking, big data analysis on a larger number of patients and investigations are possible. In this article, we describe the UNRAVEL research data platform embedded in routine practice to facilitate research in genetic cardiomyopathies. Design Eligible participants with proven or suspected cardiac disease and their relatives are asked for permission to use their data and to draw blood for biobanking. Routinely collected clinical data are included in a research database by weekly extraction. A text-mining tool has been developed to enrich UNRAVEL with unstructured data in clinical notes. Preliminary results Thus far, 828 individuals with a median age of 57 years have been included, 58% of whom are male. All data are captured in a temporal sequence amounting to a total of 18,565 electrocardiograms, 3619 echocardiograms, data from over 20,000 radiological examinations and 650,000 individual laboratory measurements. Conclusion Integration of routine electronic health care in a research data platform allows efficient data collection, including all investigations in chronological sequence. Trials embedded in the electronic health record are now possible, providing cost-effective ways to answer clinical questions. We explicitly welcome national and international collaboration and have provided our protocols and other materials on www.unravelrdp.nl
    • …
    corecore