117 research outputs found

    Mining and Representing Unstructured Nicotine Use Data in a Structured Format for Secondary Use

    Get PDF
    The objective of this study was to use rules, NLP and machine learning for addressing the problem of clinical data interoperability across healthcare providers. Addressing this problem has the potential to make clinical data comparable, retrievable and exchangeable between healthcare providers. Our focus was in giving structure to unstructured patient smoking information. We collected our data from the MIMIC-III database. We wrote rules for annotating the data, then trained a CRF sequence classifier. We obtained an f-measure of 86%, 72%, 69%, 80%, and 12% for substance smoked, frequency, amount, temporal, and duration respectively. Amount smoked yielded a small value due to scarcity of related data. Then for smoking status we obtained an f-measure of 94.8% for non-smoker class, 83.0% for current-smoker, and 65.7% for past-smoker. We created a FHIR profile for mapping the extracted data based on openEHR reference models, however in future we will explore mapping to CIMI models

    Text analysis of user-generated contents for health-care applications: case study on smoking status classification

    Get PDF
    Text mining techniques have demonstrated a potential to unlock significant patient health information from unstructured text. However, most of the published work has been done using clinical reports, which are difficult to access due to patient confidentiality. In this paper, we present an investigation of text analysis for smoking status classification from User-Generated Contents (UGC), such as online forum discussions. UGC are more widely available, compared to clinical reports. Based on analyzing the properties of UGC, we propose the use of Linguistic Inquiry Word Count (LIWC) an approach being used for the first time for such a health-related task. We also explore various factors that affect the classification performance. The experimental results and evaluation indicate that the forum classification performs well with the proposed features. It has achieved an accuracy of up to 75% for smoking status prediction. Furthermore, the utilized features set is compact (88 features only) and independent of the dataset size

    Genetic risk prediction of atrial fibrillation

    Get PDF
    Background—Atrial fibrillation (AF) has a substantial genetic basis. Identification of individuals at greatest AF risk could minimize the incidence of cardioembolic stroke. Methods—To determine whether genetic data can stratify risk for development of AF, we examined associations between AF genetic risk scores and incident AF in five prospective studies comprising 18,919 individuals of European ancestry. We examined associations between AF genetic risk scores and ischemic stroke in a separate study of 509 ischemic stroke cases (202 cardioembolic [40%]) and 3,028 referents. Scores were based on 11 to 719 common variants (≄5%) associated with AF at P-values ranging from <1x10-3 to <1x10-8 in a prior independent genetic association study. Results—Incident AF occurred in 1,032 (5.5%) individuals. AF genetic risk scores were associated with new-onset AF after adjusting for clinical risk factors. The pooled hazard ratio for incident AF for the highest versus lowest quartile of genetic risk scores ranged from 1.28 (719 variants; 95%CI, 1.13-1.46; P=1.5x10-4) to 1.67 (25 variants; 95%CI, 1.47-1.90; P=9.3x10-15). Discrimination of combined clinical and genetic risk scores varied across studies and scores (maximum C statistic, 0.629-0.811; maximum ΔC statistic from clinical score alone, 0.009-0.017). AF genetic risk was associated with stroke in age- and sex-adjusted models. For example, individuals in the highest versus lowest quartile of a 127-variant score had a 2.49-fold increased odds of cardioembolic stroke (95%CI, 1.39-4.58; P=2.7x10-3). The effect persisted after excluding individuals (n=70) with known AF (odds ratio, 2.25; 95%CI, 1.20-4.40; P=0.01). Conclusions—Comprehensive AF genetic risk scores were associated with incident AF beyond associations for clinical AF risk factors, though offered small improvements in discrimination. AF genetic risk was also associated with cardioembolic stroke in age- and sex-adjusted analyses. Efforts are warranted to determine whether AF genetic risk may improve identification of subclinical AF or help distinguish between stroke mechanisms

    Natural Language Processing for Medical Texts – A Taxonomy to Inform Integration Decisions into Clinical Practice

    Get PDF
    Electronic health records (EHR) have significantly amplified the volume of information accessible in the healthcare sector. Nevertheless, this information load also translates into elevated workloads for clinicians engaged in extracting and generating patient information. Natural Language Process (NLP) aims to overcome this problem by automatically extracting and structuring relevant information from medical texts. While other methods related to artificial intelligence have been implemented successfully in healthcare (e.g., computer vision in radiology), NLP still lacks commercial success in this domain. The lack of a structured overview of NLP systems is exacerbating the problem, especially with the emergence of new technologies like generative pre-trained transformers. Against this background, this paper presents a taxonomy to inform integration decisions of NLP systems into healthcare IT landscapes. We contribute to a better understanding of how NLP systems can be integrated into daily clinical contexts. In total, we reviewed 29 papers and 36 commercial NLP products

    A polymorphism in HLA-G modifies statin benefit in asthma

    Get PDF
    Several reports have shown that statin treatment benefits patients with asthma, however inconsistent effects have been observed. The mir-152 family (148a, 148b and 152) has been implicated in asthma. These microRNAs suppress HLA-G expression, and rs1063320, a common SNP in the HLA-G 3’UTR which is associated with asthma risk, modulates miRNA binding. We report that statins up-regulate mir-148b and 152, and affect HLA-G expression in an rs1063320 dependent fashion. In addition, we found that individuals who carried the G minor allele of rs1063320 had reduced asthma related exacerbations (emergency department visits, hospitalizations or oral steroid use) compared to non-carriers (p=0.03) in statin users ascertained in the Personalized Medicine Research Project at the Marshfield Clinic (n=421). These findings support the hypothesis that rs1063320 modifies the effect of statin benefit in asthma, and thus may contribute to variation in statin efficacy for the management of this disease

    Health systems data interoperability and implementation

    Get PDF
    Objective The objective of this study was to use machine learning and health standards to address the problem of clinical data interoperability across healthcare institutions. Addressing this problem has the potential to make clinical data comparable, searchable and exchangeable between healthcare providers. Data sources Structured and unstructured data has been used to conduct the experiments in this study. The data was collected from two disparate data sources namely MIMIC-III and NHanes. The MIMIC-III database stored data from two electronic health record systems which are CareVue and MetaVision. The data stored in these systems was not recorded with the same standards; therefore, it was not comparable because some values were conflicting, while one system would store an abbreviation of a clinical concept, the other would store the full concept name and some of the attributes contained missing information. These few issues that have been identified make this form of data a good candidate for this study. From the identified data sources, laboratory, physical examination, vital signs, and behavioural data were used for this study. Methods This research employed a CRISP-DM framework as a guideline for all the stages of data mining. Two sets of classification experiments were conducted, one for the classification of structured data, and the other for unstructured data. For the first experiment, Edit distance, TFIDF and JaroWinkler were used to calculate the similarity weights between two datasets, one coded with the LOINC terminology standard and another not coded. Similar sets of data were classified as matches while dissimilar sets were classified as non-matching. Then soundex indexing method was used to reduce the number of potential comparisons. Thereafter, three classification algorithms were trained and tested, and the performance of each was evaluated through the ROC curve. Alternatively the second experiment was aimed at extracting patient’s smoking status information from a clinical corpus. A sequence-oriented classification algorithm called CRF was used for learning related concepts from the given clinical corpus. Hence, word embedding, random indexing, and word shape features were used for understanding the meaning in the corpus. Results Having optimized all the model’s parameters through the v-fold cross validation on a sampled training set of structured data ( ), out of 24 features, only ( 8) were selected for a classification task. RapidMiner was used to train and test all the classification algorithms. On the final run of classification process, the last contenders were SVM and the decision tree classifier. SVM yielded an accuracy of 92.5% when the and parameters were set to and . These results were obtained after more relevant features were identified, having observed that the classifiers were biased on the initial data. On the other side, unstructured data was annotated via the UIMA Ruta scripting language, then trained through the CRFSuite which comes with the CLAMP toolkit. The CRF classifier obtained an F-measure of 94.8% for “nonsmoker” class, 83.0% for “currentsmoker”, and 65.7% for “pastsmoker”. It was observed that as more relevant data was added, the performance of the classifier improved. The results show that there is a need for the use of FHIR resources for exchanging clinical data between healthcare institutions. FHIR is free, it uses: profiles to extend coding standards; RESTFul API to exchange messages; and JSON, XML and turtle for representing messages. Data could be stored as JSON format on a NoSQL database such as CouchDB, which makes it available for further post extraction exploration. Conclusion This study has provided a method for learning a clinical coding standard by a computer algorithm, then applying that learned standard to unstandardized data so that unstandardized data could be easily exchangeable, comparable and searchable and ultimately achieve data interoperability. Even though this study was applied on a limited scale, in future, the study would explore the standardization of patient’s long-lived data from multiple sources using the SHARPn open-sourced tools and data scaling platformsInformation ScienceM. Sc. (Computing

    Repeatable and reusable research - Exploring the needs of users for a Data Portal for Disease Phenotyping

    Get PDF
    Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it hard to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This thesis aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, for both new and existing data portals for phenotypes (concept libraries). Methods: Exploratory sequential mixed methods were used in this thesis to look at which concept libraries are available, how they are used, what their characteristics are, where there are gaps, and what needs to be done in the future from the point of view of the people who use them. This thesis consists of three phases: 1) two qualitative studies, including one-to-one interviews with researchers, clinicians, machine learning experts, and senior research managers in health data science, as well as focus group discussions with researchers working with the Secured Anonymized Information Linkage databank, 2) the creation of an email survey (i.e., the Concept Library Usability Scale), and 3) a quantitative study with researchers, health professionals, and clinicians. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would encourage them to: 1) share their work, such as receiving citations from other researchers; and 2) reuse the work of others, such as saving a lot of time and effort, which they frequently spend on creating new code lists from scratch. They also pointed out several barriers that could inhibit them from: 1) sharing their work, such as concerns about intellectual property (e.g., if they shared their methods before publication, other researchers would use them as their own); and 2) reusing others' work, such as a lack of confidence in the quality and validity of their code lists. Participants suggested some developments that they would like to see happen in order to make research that is done with routine data more reproducible, such as the availability of a drive for more transparency in research methods documentation, such as publishing complete phenotype definitions and clear code lists. Conclusions: The findings of this thesis indicated that most participants valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform such as the CALIBER research platform. Analysis of interviews, focus group discussions, and qualitative studies revealed that different users have different requirements, facilitators, barriers, and concerns about concept libraries. This work was to investigate if we should develop concept libraries in Kuwait to facilitate the development of improved data sharing. However, at the end of this thesis the recommendation is this would be unlikely to be cost effective or highly valued by users and investment in open access research publications may be of more value to the Kuwait research/academic community

    Risk Factors of Postoperative Delirium After Hip Fracture Repair

    Get PDF
    A statement of the problem: Postoperative delirium is associated with poor functional recovery, institutionalization and high cost of medical expenditure. The objectives of the study were to i) identify risk factors of postoperative delirium by performing a systematic review; ii) develop a clinical prediction model of postoperative delirium; iii) determine whether cerebrospinal fluid (CSF) biomarkers of Alzheimer’s disease (AD) are associated with postoperative delirium. All of the objectives were explored in hip fracture population. Methods: Systematic review was conducted using prospective observational studies with estimation of association between preoperative risk factors and incident postoperative delirium in multivariable models. Risk factors identified as significant predictor of postoperative delirium in a hip fracture dataset of 429 individuals with acute hip fracture were combined with the risk factors identified from the systematic review. A clinical prediction model was developed and internally and externally validated. CSF was collected from individuals with hip fracture enrolled in a clinical trial and analyzed for biomarkers of Alzheimer’s disease (AD). Results: Search yielded 6,380 titles and abstracts from electronic databases and 72 titles from hand searches, and 10 studies met inclusion criteria. Cognitive impairment most consistently remained statistically significant after adjusting for other risk factors in multivariable models, followed by BMI/albumin and multiple co-morbidities. The independent variables for predicting postoperative delirium in the prediction model were age, gender, dementia, Parkinson’s disease, American Society of Anesthesiologists (ASA) Physical Status Classification, and albumin level. Our postoperative delirium RPM had discrimination (receiver operating characteristic ROC curve) of 0.72 with external validation ROC of 0.62. There was no association of CSF AD biomarkers with postoperative delirium. Conclusions: Cognitive impairment was identified as one of the strongest risk factors for postoperative delirium in hip fracture population. Risk stratification may be performed using the risk predication model (RPM) developed in hip fracture population, but the discrimination ability of the RPM in external validation was less than optimal. Although cognitive impairment was strongly associated with postoperative delirium, CSF AD biomarkers were not associated with postoperative delirium in a small group of hip fracture patients
    • 

    corecore