Introduction: Electronic health records from primary care, are now aggregated in a number of large datasets from primary care settings, containing both coded data and free-text. Secondary users can easily undertake analyses using coded data. However although the balance of information between these codes and free text is variable, they rarely use the information contained in doctors’ free-text notes - because of their ‘messy’ nature and the costs of ensuring anonymity. Our epidemiological studies within the Patient Records Enhancement Project has demonstrated that free text contains important information, that is often ignored.
Method: Human computer interaction (HCI) studies, using qualitative approaches, can help us understand the reasons for variability in the balance of coded and free text data. We undertook field studies in six GP surgeries which included observations of record use across the surgery, video analysis of real patient consultations and interviews with a range of surgery staff. We also undertook ‘simulated’ consultations, with two medical actors playing the part of the patient, allowing us to standarise the patient across doctors and software systems.
Results: Preliminary results suggest several reasons for variation in data recording. Doctors create notes in order to best manage patients with little consideration for use by others, and reported limited awareness of secondary uses of the information. Doctors often record and “read” a picture painted by the overall record of a consultation or record symptoms and signs in free text notes, and choose not to code a definite diagnosis. If coding, they often choose a more general non specific code, even when they have inferred and acted on a clear diagnosis. These approaches reflect processes of progressing from differential to definite diagnosis, and the surgery’s administrative and consultation processes.
Conclusion: Our findings may explain apparent delays in diagnosis often observed in epidemiological analyses. The picture portrayed within records may not be at all clear to researchers relying on coded data. Our results have implications for secondary users of data and assessment of data for quality of care. Follow on work might result in typologies of diseases liable to coded data deficits and support software development