91 research outputs found

    What does validation of cases in electronic record databases mean? The potential contribution of free text

    Get PDF
    Electronic health records are increasingly used for research. The definition of cases or endpoints often relies on the use of coded diagnostic data, using a pre-selected group of codes. Validation of these cases, as ‘true’ cases of the disease, is crucial. There are, however, ambiguities in what is meant by validation in the context of electronic records. Validation usually implies comparison of a definition against a gold standard of diagnosis and the ability to identify false negatives (‘true’ cases which were not detected) as well as false positives (detected cases which did not have the condition). We argue that two separate concepts of validation are often conflated in existing studies. Firstly, whether the GP thought the patient was suffering from a particular condition (which we term confirmation or internal validation) and secondly, whether the patient really had the condition (external validation). Few studies have the ability to detect false negatives who have not received a diagnostic code. Natural language processing is likely to open up the use of free text within the electronic record which will facilitate both the validation of the coded diagnosis and searching for false negatives

    The early presentation and management of rheumatoid arthritis cases in primary care

    Get PDF
    Recent NICE guidance has emphasised the importance of early recognition and referral of patients with inflammatory arthritis so that disease modifying treatment can be promptly initiated. The timely identification of such patients, given the large numbers consulting with musculoskeletal complaints, is a considerable challenge and descriptive data from primary care are sparse. Our objective was to examine GP records from three years before to 14 days after the first coded diagnosis of rheumatoid arthritis in order to describe the early course and management of the diseas

    Data quality in European primary care research databases. Report of a workshop held in London September 2013

    Get PDF
    Primary care research databases provide a significant resource for health services and epidemiological research. However since data are recorded primarily for clinical care their suitability for research may vary widely according to the research application or recording practices of individual general practitioners. A methodological approach for characterising data quality is required. We describe a one-day workshop entitled “Towards a common protocol for measuring and monitoring data quality in European primary care research databases”. Researchers, database experts and clinicians were invited to give their perspectives on data quality and to exchange ideas on what data quality metrics should be made available to researchers. We report the main outcomes of this workshop, including a summary of the presentations and discussions and suggested way forward

    A pragmatic approach for measuring data quality in primary care databases

    Get PDF
    There is currently no widely recognised methodology for undertaking data quality assessment in electronic health records used for research. In an attempt to address this, we have developed a protocol for measuring and monitoring data quality in primary care research databases, whereby practice-based data quality measures are tailored to the intended use of the data. Our approach was informed by an in-depth investigation of aspects of data quality in the Clinical Practice Research Datalink Gold database and presentations of the results to data users. Although based on a primary care database, much of our proposed approach would be equally applicable to other health care databases

    Annotating a corpus of clinical text records for learning to recognize symptoms automatically

    Get PDF
    We report on a research effort to create a corpus of clinical free text records enriched with annotation for symptoms of a particular disease (ovarian cancer). We describe the original data, the annotation procedure and the resulting corpus. The data (approximately 192K words) was annotated by three clinicians and a procedure was devised to resolve disagreements. We are using the corpus to investigate the amount of symptom-related information in clinical records that is not coded, and to develop techniques for recognizing these symptoms automatically in unseen text

    Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.

    Get PDF
    Objective: UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Materials and methods: Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. Results: An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. Discussion: We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Conclusions: Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases

    Quality of recording of diabetes in the UK: how does the GP’s method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database

    Get PDF
    Objective: To assess the effect of coding quality on estimates of the incidence of diabetes in the UK between 1995 and 2014. Design: A cross-sectional analysis examining diabetes coding from 1995 to 2014 and how the choice of codes (diagnosis codes vs codes which suggest diagnosis) and quality of coding affect estimated incidence. Setting: Routine primary care data from 684 practices contributing to the UK Clinical Practice Research Datalink (data contributed from Vision (INPS) practices). Main outcome measure: Incidence rates of diabetes and how they are affected by (1) GP coding and (2) excluding ‘poor’ quality practices with at least 10% incident patients inaccurately coded between 2004 and 2014. Results: Incidence rates and accuracy of coding varied widely between practices and the trends differed according to selected category of code. If diagnosis codes were used, the incidence of type 2 increased sharply until 2004 (when the UK Quality Outcomes Framework was introduced), and then flattened off, until 2009, after which they decreased. If non-diagnosis codes were included, the numbers continued to increase until 2012. Although coding quality improved over time, 15% of the 666 practices that contributed data between 2004 and 2014 were labelled ‘poor’ quality. When these practices were dropped from the analyses, the downward trend in the incidence of type 2 after 2009 became less marked and incidence rates were higher. Conclusions: In contrast to some previous reports, diabetes incidence (based on diagnostic codes) appears not to have increased since 2004 in the UK. Choice of codes can make a significant difference to incidence estimates, as can quality of recording. Codes and data quality should be checked when assessing incidence rates using GP data

    Classification of brain tumours from MR spectra: the INTERPRET collaboration and its outcomes.

    Get PDF
    The INTERPRET project was a multicentre European collaboration, carried out from 2000 to 2002, which developed a decision-support system (DSS) for helping neuroradiologists with no experience of MRS to utilize spectroscopic data for the diagnosis and grading of human brain tumours. INTERPRET gathered a large collection of MR spectra of brain tumours and pseudo-tumoural lesions from seven centres. Consensus acquisition protocols, a standard processing pipeline and strict methods for quality control of the aquired data were put in place. Particular emphasis was placed on ensuring the diagnostic certainty of each case, for which all cases were evaluated by a clinical data validation committee. One outcome of the project is a database of 304 fully validated spectra from brain tumours, pseudotumoural lesions and normal brains, along with their associated images and clinical data, which remains available to the scientific and medical community. The second is the INTERPRET DSS, which has continued to be developed and clinically evaluated since the project ended. We also review here the results of the post-INTERPRET period. We evaluate the results of the studies with the INTERPRET database by other consortia or research groups. A summary of the clinical evaluations that have been performed on the post-INTERPRET DSS versions is also presented. Several have shown that diagnostic certainty can be improved for certain tumour types when the INTERPRET DSS is used in conjunction with conventional radiological image interpretation. About 30 papers concerned with the INTERPRET single-voxel dataset have so far been published. We discuss stengths and weaknesses of the DSS and the lessons learned. Finally we speculate on how the INTERPRET concept might be carried into the future.Funding from project MARESCAN (SAF2011-23870) from Ministerio de Economia y Competitividad in Spain. This work was also partially funded by CIBER-BBN, which is an initiative of the VI National R&D&i Plan 2008-2011, CIBER Actions and financed by the Instituto de Salud Carlos III with assistance from the European Regional Development Fund. JRG acknowledges support from Cancer Research UK, the University of Cambridge and Hutchison Whampoa Ltd.This is the author accepted manuscript. The final version is available from Wiley via http://dx.doi.org/10.1002/nbm.343
    corecore