426 research outputs found

    Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries

    Get PDF
    Includes details on the implementation of MetaMap and IntraMap, prioritization rules, the test set of clinical trials and the classification of the external test set according to the 171 GBD categories. Dataset S1: Expert-based enrichment database for the classification according to the 28 GBD categories. Manual classification of 503 UMLS concepts that could not be mapped to any of the 28 GBD categories. Dataset S2: Expert-based enrichment database for the classification according to the 171 GBD categories. Manual classification of 655 UMLS concepts that could not be mapped to any of the 171 GBD categories, among which 108 could be projected to candidate GBD categories. Table S1: Excluded residual GBD categories for the grouping of the GBD cause list in 171 GBD categories. A grouping of 193 GBD categories was defined during the GBD 2010 study to inform policy makers about the main health problems per country. From these 193 GBD categories, we excluded the 22 residual categories listed in the Table. We developed a classifier for the remaining 171 GBD categories. Among these residual categories, the unique excluded categories in the grouping of 28 GBD categories were “Other infectious diseases” and “Other endocrine, nutritional, blood, and immune disorders”. Table S2: Per-category evaluation of performance of the classifier for the 171 GBD categories plus the “No GBD” category. Number of trials per GBD category from the test set of 2,763 clinical trials. Sensitivities, specificities (in %) and likelihood ratios for each of the 171 GBD categories plus the “No GBD” category for the classifier using the Word Sense Disambiguation server, the expert-based enrichment database and the priority to the health condition field. Table S3: Performance of the 8 versions of the classifier for the 171 GBD categories. Exact-matching and weighted averaged sensitivities and specificities for 8 versions of the classifier for the 171 GBD categories. Exact-matching corresponds to the proportion (in %) of trials for which the automatic GBD classification is correct. Exact-matching was estimated over all trials (N = 2,763), trials concerning a unique GBD category (N = 2,092), trials concerning 2 or more GBD categories (N = 187), and trials not relevant for the GBD (N = 484). The weighted averaged sensitivity and specificity corresponds to the weighted average across GBD categories of the sensitivities and specificities for each GBD category plus the “No GBD” category (in %). The 8 versions correspond to the combinations of the use or not of the Word Sense Disambiguation server during the text annotation, the expert-based enrichment database, and the priority to the health condition field as a prioritization rule. Table S4: Per-category evaluation of the performance of the baseline for the 28 GBD categories plus the “No GBD” category. Number of trials per GBD category from the test set of 2,763 clinical trials. Sensitivities and specificities (in %) of the 28 GBD categories plus the “No GBD” category for the classification of clinical trial records towards GBD categories without using the UMLS knowledge source but based on the recognition in free text of the names of diseases defining in each GBD category only. For the baseline a clinical trial records was classified with a GBD category if at least one of the 291 disease names from the GBD cause list defining that GBD category appeared verbatim in the condition field, the public or scientific titles, separately, or in at least one of these three text fields. (DOCX 84 kb

    PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

    Full text link
    With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues

    Exploiting electronic health records for research on atrial fibrillation: risk factors, subtypes, and outcomes

    Get PDF
    BACKGROUND: Electronic health records (EHRs), collected on large populations in routine clinical care, may hold novel insights into the heart rhythm disorder atrial fibrillation (AF). AIM: To exploit EHRs to investigate, validate and extend evidence for AF risk factors, subtypes, and outcomes. METHODS: The CALIBER dataset (1997–2010) linking primary care, secondary care, and mortality records for a representative subset of the UK population was used (i) to model associations between cardiovascular disease (CVD) risk factors and incident AF, including AF with (AF+) and AF without (AF–) intercurrent CVD, (ii) to create EHR definitions for eight AF subtypes (structural, focal, polygenic, postoperative, valvular, monogenic, respiratory and AF in athletes) and (iii) to investigate stroke outcomes by CHA2DS2-VASc, sex, and warfarin use. RESULTS: Among 1,949,052 individuals, 50,097 developed incident AF: 12,652 (25.3%) with AF+ and 37,445 (74.7%) with AF–. Smoking (HR [95%CI] for AF+ vs. AF–: 1.66 [1.56,1.77] vs. 1.21 [1.16,1.25]), hypertension (2.19 [2.11,2.27] vs. 1.65 [1.62,1.69]), and diabetes (2.03 [1.94,2.12] vs. 1.45 [1.41,1.49]) showed consistent direct associations with AF+ and AF–, while heavy drinking (1.17 [0.81,1.67] vs. 1.99 [1.68,2.34]) and total cholesterol levels (0.99 [0.96,1.02] vs. 0.85 [0.84,0.87]) showed inconsistent associations with AF+ and AF–. EHR definitions for AF subtypes were created by combining 2813 diagnosis, medication, and procedure codes. There were 12,751 individuals with AF and valvular heart disease. Prosthetic replacements, mitral stenosis and aortic stenosis showed higher HR [95%CI] for stroke, thromboembolism and mortality (1.13 [1.02,1.24], 1.20 [1.05,1.36], and 1.27 [1.19,1.37] respectively). The net-clinical benefit (NCB [95%CI] per 100 person-years) of warfarin was shown from CHA2DS2-VASc≥2 in men (0.5 [0.1,0.9]) and CHA2DS2-VASc≥3 in women (1.5 [1.1,1.9]). CONCLUSION: AF is a heterogeneous condition associated with diverse disease mechanisms. EHRs can help refine understanding of risk factors, subtypes, and outcomes with relevance for clinical practice

    Medical Informatics

    Get PDF
    Information technology has been revolutionizing the everyday life of the common man, while medical science has been making rapid strides in understanding disease mechanisms, developing diagnostic techniques and effecting successful treatment regimen, even for those cases which would have been classified as a poor prognosis a decade earlier. The confluence of information technology and biomedicine has brought into its ambit additional dimensions of computerized databases for patient conditions, revolutionizing the way health care and patient information is recorded, processed, interpreted and utilized for improving the quality of life. This book consists of seven chapters dealing with the three primary issues of medical information acquisition from a patient's and health care professional's perspective, translational approaches from a researcher's point of view, and finally the application potential as required by the clinicians/physician. The book covers modern issues in Information Technology, Bioinformatics Methods and Clinical Applications. The chapters describe the basic process of acquisition of information in a health system, recent technological developments in biomedicine and the realistic evaluation of medical informatics

    Investigating penetrance of rare genetic variants using population cohorts

    Get PDF
    The same genetic variant found in different individuals can cause a spectrum of phenotypes, with some individuals showing no signs of any clinical illness, and some displaying severe illness. Variants that cause this can be said to show incomplete penetrance, where the related genotype either causes clinical disease or not, or they can be said to display variable expressivity, in which the clinical symptoms can vary across a spectrum. Incomplete penetrance and variable expressivity are both thought to be influenced by a large number of factors, including genetic modifiers, epigenetics, and environmental factors. Many thousands of genetic variants have been identified as causal of monogenic disorders, mostly determined through small clinical studies, and thus the penetrance and expressivity of these variants may be overestimated when compared to their effect in the general population. With the wealth of population cohort data currently available, the penetrance and expressivity of such genetic variants can be investigated across a much wider contingent, potentially helping to reclassify variants that were previously thought to be completely penetrant. This thesis aims to investigate the penetrance and expressivity of rare genetic variants in large population cohorts, and to potentially identify any genetic modifiers that could also affect the phenotypic effect of these variants, including the presence of other rare variants, and the aggregation of small effect common variants. We show that putatively damaging variants in a large number of genes are present at a higher rate than previously expected in healthy population cohorts. Furthermore, we show that as an aggregate, individuals who carry one of these variants have sub-clinical phenotypes related to the traits seen in clinical disease cases with variants in similar genes. We also show that the penetrance and expressivity of these rare variants can be modified by the presence of other rare variants in similar genes, and through common genetic variant, aggregated as polygenic scores. We then investigate methods of identifying rare non-coding variants that could be potential genetic modifiers

    Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach

    Get PDF
    Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future. The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset. The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%). Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis
    • …
    corecore