2,650 research outputs found

    Performance Measures Using Electronic Health Records: Five Case Studies

    Get PDF
    Presents the experiences of five provider organizations in developing, testing, and implementing four types of electronic quality-of-care indicators based on EHR data. Discusses challenges, and compares results with those from traditional indicators

    Doctor of Philosophy

    Get PDF
    dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level

    Stroke outcome measurements from electronic medical records : cross-sectional study on the effectiveness of neural and nonneural classifiers

    Get PDF
    Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results: The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Calibrating mini-mental state examination scores to predict misdiagnosed dementia patients

    Get PDF
    Mini-Mental State Examination (MMSE) is used as a diagnostic test for dementia to screen a patient’s cognitive assessment and disease severity. However, these examinations are often inaccurate and unreliable either due to human error or due to patients’ physical disability to correctly interpret the questions as well as motor deficit. Erroneous data may lead to a wrong assessment of a specific patient. Therefore, other clinical factors (e.g., gender and comorbidities) existing in electronic health records, can also play a significant role, while reporting her examination results. This work considers various clinical attributes of dementia patients to accurately determine their cognitive status in terms of the Mini-Mental State Examination (MMSE) Score. We employ machine learning models to calibrate MMSE score and classify the correctness of diagnosis among patients, in order to assist clinicians in a better understanding of the progression of cognitive impairment and subsequent treatment. For this purpose, we utilize a curated real-world ageing study data. A random forest prediction model is employed to estimate the Mini-Mental State Examination score, related to the diagnostic classification of patients.This model uses various clinical attributes to provide accurate MMSE predictions, succeeding in correcting an important percentage of cases that contain previously identified miscalculated scores in our dataset. Furthermore, we provide an effective classification mechanism for automatically identifying patient episodes with inaccurate MMSE values with high confidence. These tools can be combined to assist clinicians in automatically finding episodes within patient medical records where the MMSE score is probably miscalculated and estimating what the correct value should be. This provides valuable support in the decision making process for diagnosing potential dementia patients. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone

    Towards the development of data governance standards for using clinical free-text data in health research: a position paper

    Get PDF
    Background: Free-text clinical data (such as outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be de-identified or anonymised before they can be reused for research, but there is a lack of established guidelines to govern effective de-identification and use of free-text information and avoid damaging data utility as a by-product. / Objective: We set out to work towards data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient/public benefit. / Methods: We outlined (UK) data protection legislation and regulations for context, and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders including text mining researchers and the general public to explore perceived barriers and solutions in working with clinical free-text. / Results: We propose a set of recommendations, including the need: for authoritative guidance on data governance for the reuse of free-text data; to ensure public transparency in data flows and uses; to treat de-identified free-text as potentially identifiable with use limited to accredited data safe-havens; and, to commit to a culture of continuous improvement to understand the relationships between efficacy of de-identification and re-identification risks, so this can be communicated to all stakeholders. / Conclusions: By drawing together the findings of a combination of activities, our unique study has added new knowledge towards the development of data governance standards for the reuse of clinical free-text data for secondary purposes. Whilst working in accord with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit
    • …
    corecore