14 research outputs found
Co-occurence search.
<p>Faceted search allows users to apply multiple filters – here we have selected Hydralazine Hydrochloride as an Active Ingredient and started typing ‘AST’ in the Applicant column.</p
An annotation graph.
<p>In GATE, annotations are encoded by associating features with character offsets, indicating the text to which they pertain.</p
GATE embedded APIs.
<p>GATE provides a set of Java APIs, called GATE Embedded. This figure summarises the modules provided. Language resources (LRs) are data-only resources such as lexica, corpora or ontologies. Processing Resources (PRs) are principally programmatic or algorithmic. Visual resources (VRs) allow users to interact visually with other resources.</p
ANNIC (ANNotations In Context).
<p>Complex queries are supported, such as a query that searches for person annotations followed by past tense verbs followed by organisation names, as shown in this figure. The query appears in the third line from the top; the patterns described are for people annotation followed by organisation annotations. All matching text ranges then appear in the lower half of the tool, with a graphical representation of the individual annotations concerned in the middle part.</p
Chinese annotations.
<p>In GATE's document view, annotations are shown as highlighted sections of text. This figure shows Chinese text with highlighted annotations. The annotations are listed at the bottom, showing their type, offsets and features.</p
Table_1_Identifying features of risk periods for suicide attempts using document frequency and language use in electronic health records.DOCX
BackgroundIndividualising mental healthcare at times when a patient is most at risk of suicide involves shifting research emphasis from static risk factors to those that may be modifiable with interventions. Currently, risk assessment is based on a range of extensively reported stable risk factors, but critical to dynamic suicide risk assessment is an understanding of each individual patient’s health trajectory over time. The use of electronic health records (EHRs) and analysis using machine learning has the potential to accelerate progress in developing early warning indicators.SettingEHR data from the South London and Maudsley NHS Foundation Trust (SLaM) which provides secondary mental healthcare for 1.8 million people living in four South London boroughs.ObjectivesTo determine whether the time window proximal to a hospitalised suicide attempt can be discriminated from a distal period of lower risk by analysing the documentation and mental health clinical free text data from EHRs and (i) investigate whether the rate at which EHR documents are recorded per patient is associated with a suicide attempt; (ii) compare document-level word usage between documents proximal and distal to a suicide attempt; and (iii) compare n-gram frequency related to third-person pronoun use proximal and distal to a suicide attempt using machine learning.MethodsThe Clinical Record Interactive Search (CRIS) system allowed access to de-identified information from the EHRs. CRIS has been linked with Hospital Episode Statistics (HES) data for Admitted Patient Care. We analysed document and event data for patients who had at some point between 1 April 2006 and 31 March 2013 been hospitalised with a HES ICD-10 code related to attempted suicide (X60–X84; Y10–Y34; Y87.0/Y87.2).Findingsn = 8,247 patients were identified to have made a hospitalised suicide attempt. Of these, n = 3,167 (39.8%) of patients had at least one document available in their EHR prior to their first suicide attempt. N = 1,424 (45.0%) of these patients had been “monitored” by mental healthcare services in the past 30 days. From 60 days prior to a first suicide attempt, there was a rapid increase in the monitoring level (document recording of the past 30 days) increasing from 35.1 to 45.0%. Documents containing words related to prescribed medications/drugs/overdose/poisoning/addiction had the highest odds of being a risk indicator used proximal to a suicide attempt (OR 1.88; precision 0.91 and recall 0.93), and documents with words citing a care plan were associated with the lowest risk for a suicide attempt (OR 0.22; precision 1.00 and recall 1.00). Function words, word sequence, and pronouns were most common in all three representations (uni-, bi-, and tri-gram).ConclusionEHR documentation frequency and language use can be used to distinguish periods distal from and proximal to a suicide attempt. However, in our study 55.0% of patients with documentation, prior to their first suicide attempt, did not have a record in the preceding 30 days, meaning that there are a high number who are not seen by services at their most vulnerable point.</p
Recommended from our members
Understanding views around the creation of a consented, donated databank of clinical free text to develop and train natural language processing models for research: focus group interviews with stakeholders
Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery
Association results of the SNPs included in the GWAS of oral cancer (by p-values), pair-wise r<sup>2</sup> estimates with rs991316, and recombination rates, for SNPs in the <i>ADH</i> gene region on 4q23.
<p>P-values indicating the strength of association for each SNP in the GWAS with oral cancer are shown on the −log10 scale (left Y-axis), against their positions on chromosome 4 (Build 36.3). The color of each point and SNP represent the degree of linkage disequilibrium (r<sup>2</sup>) with rs991316 according to HapMap phase II CEU data. Highlighted in the figure are rs1229984, rs1789924 and rs971074, which have been reported to be associated with UADT cancers previously, as well as the rs991316 SNP which was discovered to be associated specifically with oral cancer in the current study. rs1229984 was not genotyped, nor tagged by a proxy variant on the HumanHap300 BeadChip but was genotyped by Taqman assay in the same samples from Central Europe and ARCAGE studies as included in the discovery phase of current GWAS, and r<sup>2</sup> between rs1229984 and rs991316 was estimated in the 3,513 controls from Central European and ARCAGE studies. Recombination rates across the region are shown by the light blue line plotted against the right y axis. Genes in the region are represented with arrow heads indicating the direction of transcription.</p