66 research outputs found
Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset
Clinical coding is currently a labour-intensive, error-prone, but critical
administrative process whereby hospital patient episodes are manually assigned
codes by qualified staff from large, standardised taxonomic hierarchies of
codes. Automating clinical coding has a long history in NLP research and has
recently seen novel developments setting new state of the art results. A
popular dataset used in this task is MIMIC-III, a large intensive care database
that includes clinical free text notes and associated codes. We argue for the
reconsideration of the validity MIMIC-III's assigned codes that are often
treated as gold-standard, especially when MIMIC-III has not undergone secondary
validation. This work presents an open-source, reproducible experimental
methodology for assessing the validity of codes derived from EHR discharge
summaries. We exemplify the methodology with MIMIC-III discharge summaries and
show the most frequently assigned codes in MIMIC-III are under-coded up to 35%
Hospital-wide natural language processing summarising the health data of 1 million patients
Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR's try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King's College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task
Charting a Course for Smartphones and Wearables to Transform Population Health Research
The use of data from smartphones and wearable devices has huge potential for population health research, given the high level of device ownership; the range of novel health-relevant data types available from consumer devices; and the frequency and duration with which data are, or could be, collected. Yet, the uptake and success of large-scale mobile health research in the last decade have not met this intensely promoted opportunity. We make the argument that digital person-generated health data are required and necessary to answer many top priority research questions, using illustrative examples taken from the James Lind Alliance Priority Setting Partnerships. We then summarize the findings from 2 UK initiatives that considered the challenges and possible solutions for what needs to be done and how such solutions can be implemented to realize the future opportunities of digital person-generated health data for clinically important population health research. Examples of important areas that must be addressed to advance the field include digital inequality and possible selection bias; easy access for researchers to the appropriate data collection tools, including how best to harmonize data items; analysis methodologies for time series data; patient and public involvement and engagement methods for optimizing recruitment, retention, and public trust; and methods for providing research participants with greater control over their data. There is also a major opportunity, provided through the linkage of digital person-generated health data to routinely collected data, to support novel population health research, bringing together clinician-reported and patient-reported measures. We recognize that well-conducted studies need a wide range of diverse challenges to be skillfully addressed in unison (eg, challenges regarding epidemiology, data science and biostatistics, psychometrics, behavioral and social science, software engineering, user interface design, information governance, data management, and patient and public involvement and engagement). Consequently, progress would be accelerated by the establishment of a new interdisciplinary community where all relevant and necessary skills are brought together to allow for excellence throughout the life cycle of a research study. This will require a partnership of diverse people, methods, and technologies. If done right, the synergy of such a partnership has the potential to transform many millions of people’s lives for the better
Barriers to and Facilitators of Using Remote Measurement Technology in the Long-Term Monitoring of Individuals With ADHD: Interview Study
BACKGROUND: Remote measurement technology (RMT) has the potential to address current research and clinical challenges of attention-deficit/hyperactivity disorder (ADHD) symptoms and its co-occurring mental health problems. Despite research using RMT already being successfully applied to other populations, adherence and attrition are potential obstacles when applying RMT to a disorder such as ADHD. Hypothetical views and attitudes toward using RMT in a population with ADHD have previously been explored; however, to our knowledge, there is no previous research that has used qualitative methods to understand the barriers to and facilitators of using RMT in individuals with ADHD following participation in a remote monitoring period. OBJECTIVE: We aimed to evaluate the barriers to and facilitators of using RMT in individuals with ADHD compared with a group of people who did not have a diagnosis of ADHD. We also aimed to explore participants' views on using RMT for 1 or 2 years in future studies. METHODS: In total, 20 individuals with ADHD and 20 individuals without ADHD were followed up for 10 weeks using RMT that involved active (questionnaires and cognitive tasks) and passive (smartphone sensors and wearable devices) monitoring; 10 adolescents and adults with ADHD and 12 individuals in a comparison group completed semistructured qualitative interviews at the end of the study period. The interviews focused on potential barriers to and facilitators of using RMT in adults with ADHD. A framework methodology was used to explore the data qualitatively. RESULTS: Barriers to and facilitators of using RMT were categorized as health-related, user-related, and technology-related factors across both participant groups. When comparing themes that emerged across the participant groups, both individuals with and without ADHD experienced similar barriers and facilitators in using RMT. The participants agreed that RMT can provide useful objective data. However, slight differences between the participant groups were identified as barriers to RMT across all major themes. Individuals with ADHD described the impact that their ADHD symptoms had on participating (health-related theme), commented on the perceived cost of completing the cognitive tasks (user-related theme), and described more technical challenges (technology-related theme) than individuals without ADHD. Hypothetical views on future studies using RMT in individuals with ADHD for 1 or 2 years were positive. CONCLUSIONS: Individuals with ADHD agreed that RMT, which uses repeated measurements with ongoing active and passive monitoring, can provide useful objective data. Although themes overlapped with previous research on barriers to and facilitators of engagement with RMT (eg, depression and epilepsy) and with a comparison group, there are unique considerations for people with ADHD, for example, understanding the impact that ADHD symptoms may have on engaging with RMT. Researchers need to continue working with people with ADHD to develop future RMT studies for longer periods
GEOexplorer: a webserver for gene expression analysis and visualisation
Gene Expression Omnibus (GEO) is a database repository hosting a substantial proportion of publicly available high throughput gene expression data. Gene expression analysis is a powerful tool to gain insight into the mechanisms and processes underlying the biological and phenotypic differences between sample groups. Despite the wide availability of gene expression datasets, their access, analysis, and integration are not trivial and require specific expertise and programming proficiency. We developed the GEOexplorer webserver to allow scientists to access, integrate and analyse gene expression datasets without requiring programming proficiency. Via its user-friendly graphic interface, users can easily apply GEOexplorer to perform interactive and reproducible gene expression analysis of microarray and RNA-seq datasets, while producing a wealth of interactive visualisations to facilitate data exploration and interpretation, and generating a range of publication ready figures. The webserver allows users to search and retrieve datasets from GEO as well as to upload user-generated data and combine and harmonise two datasets to perform joint analyses. GEOexplorer, available at https://geoexplorer.rosalind.kcl.ac.uk, provides a solution for performing interactive and reproducible analyses of microarray and RNA-seq gene expression data, empowering life scientists to perform exploratory data analysis and differential gene expression analysis on-the-fly without informatics proficiency
Genetic Risk as a Marker of Amyloid-β and Tau Burden in Cerebrospinal Fluid.
BACKGROUND: The search for a biomarker of Alzheimer's disease (AD) pathology (amyloid-β (Aβ) and tau) is ongoing, with the best markers currently being measurements of Aβ and tau in cerebrospinal fluid (CSF) and via positron emission tomography (PET) scanning. These methods are relatively invasive, costly, and often have high screening failure rates. Consequently, research is aiming to elucidate blood biomarkers of Aβ and tau. OBJECTIVE: This study aims to investigate a case/control polygenic risk score (PGRS) as a marker of tau and investigate blood markers of a combined Aβ and tau outcome for the first time. A sub-study also considers plasma tau as markers of Aβ and tau pathology in CSF. METHODS: We used data from the EDAR*, DESCRIPA**, and Alzheimer's Disease Neuroimaging Initiative (ADNI) cohorts in a logistic regression analysis to investigate blood markers of Aβ and tau in CSF. In particular, we investigated the extent to which a case/control PGRS is predictive of CSF tau, CSF amyloid, and a combined amyloid and tau outcome. The predictive ability of models was compared to that of age, gender, and APOE genotype ('basic model'). RESULTS: In EDAR and DESCRIPA test data, inclusion of a case/control PGRS was no more predictive of Aβ, and a combined Aβ and tau endpoint than the basic models (accuracies of 66.0%, and 73.3% respectively). The tau model saw a small increase in accuracy compared to basic models (59.6%). ADNI 2 test data also showed a slight increase in accuracy for the Aβ model when compared to the basic models (61.4%). CONCLUSION: We see some evidence that a case/control PGRS is marginally more predictive of Aβ and tau pathology than the basic models. The search for predictive factors of Aβ and tau pathologies, above and beyond demographic information, is still ongoing. Better understanding of AD risk alleles, development of more sensitive assays, and studies of larger sample size are three avenues that may provide such factors. However, the clinical utility of possible predictors of brain Aβ and tau pathologies must also be investigated.*'Beta amyloid oligomers in the early diagnosis of AD and as marker for treatment response'**'Development of screening guidelines and criteria for pre-dementia Alzheimer's disease'.Multiple funders listed on paper
Remote Administration of ADHD-Sensitive Cognitive Tasks: A Pilot Study
Objective: We assessed the feasibility and validity of remote researcher-led administration and self-administration of modified versions of two cognitive tasks sensitive to ADHD, a four-choice reaction time task (Fast task) and a combined Continuous Performance Test/Go No-Go task (CPT/GNG), through a new remote measurement technology system. Method: We compared the cognitive performance measures (mean and variability of reaction times (MRT, RTV), omission errors (OE) and commission errors (CE)) at a remote baseline researcher-led administration and three remote self-administration sessions between participants with and without ADHD (n = 40). Results: The most consistent group differences were found for RTV, MRT and CE at the baseline researcher-led administration and the first self-administration, with 8 of the 10 comparisons statistically significant and all comparisons indicating medium to large effect sizes. Conclusion: Remote administration of cognitive tasks successfully captured the difficulties with response inhibition and regulation of attention, supporting the feasibility and validity of remote assessments
Temporal Evolution of Multiday, Epileptic Functional Networks Prior to Seizure Occurrence
Epilepsy is one of the most common neurological disorders, characterized by the occurrence of repeated seizures. Given that epilepsy is considered a network disorder, tools derived from network neuroscience may confer the valuable ability to quantify the properties of epileptic brain networks. In this study, we use well-established brain network metrics (i.e., mean strength, variance of strength, eigenvector centrality, betweenness centrality) to characterize the temporal evolution of epileptic functional networks over several days prior to seizure occurrence. We infer the networks using long-term electroencephalographic recordings from 12 people with epilepsy. We found that brain network metrics are variable across days and show a circadian periodicity. In addition, we found that in 9 out of 12 patients the distribution of the variance of strength in the day (or even two last days) prior to seizure occurrence is significantly different compared to the corresponding distributions on all previous days. Our results suggest that brain network metrics computed fromelectroencephalographic recordings could potentially be used to characterize brain network changes that occur prior to seizures, and ultimately contribute to seizure warning systems
- …