Search CORE

14 research outputs found

Co-occurence search.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

Faceted search allows users to apply multiple filters – here we have selected Hydralazine Hydrochloride as an Active Ingredient and started typing ‘AST’ in the Applicant column.</p

FigShare

The GATE developer interface.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

The GATE developer interface.</p

FigShare

An annotation graph.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

In GATE, annotations are encoded by associating features with character offsets, indicating the text to which they pertain.</p

FigShare

Comparison of P-Value and BFDP ranking.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

Comparison of P-Value and BFDP ranking.</p

FigShare

GATE embedded APIs.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

GATE provides a set of Java APIs, called GATE Embedded. This figure summarises the modules provided. Language resources (LRs) are data-only resources such as lexica, corpora or ontologies. Processing Resources (PRs) are principally programmatic or algorithmic. Visual resources (VRs) allow users to interact visually with other resources.</p

FigShare

ANNIC (ANNotations In Context).

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

Complex queries are supported, such as a query that searches for person annotations followed by past tense verbs followed by organisation names, as shown in this figure. The query appears in the third line from the top; the patterns described are for people annotation followed by organisation annotations. All matching text ranges then appear in the lower half of the tool, with a graphical representation of the individual annotations concerned in the middle part.</p

FigShare

Chinese annotations.

Author: Angus Roberts (162102)
Hamish Cunningham (162314)
Kalina Bontcheva (280262)
Valentin Tablan (280261)
Publication venue
Publication date
Field of study

In GATE's document view, annotations are shown as highlighted sections of text. This figure shows Chinese text with highlighted annotations. The annotations are listed at the bottom, showing their type, offsets and features.</p

FigShare

Table_1_Identifying features of risk periods for suicide attempts using document frequency and language use in electronic health records.DOCX

Author: Angus Roberts (162102)
George Gkotsis (10827670)
Johnny Downs (4922650)
Matthew Hotopf (51508)
Rina Dutta (10827667)
Robert Stewart (34430)
Sumithra U. Velupillai (17612808)
Publication venue
Publication date: 11/12/2023
Field of study

BackgroundIndividualising mental healthcare at times when a patient is most at risk of suicide involves shifting research emphasis from static risk factors to those that may be modifiable with interventions. Currently, risk assessment is based on a range of extensively reported stable risk factors, but critical to dynamic suicide risk assessment is an understanding of each individual patient’s health trajectory over time. The use of electronic health records (EHRs) and analysis using machine learning has the potential to accelerate progress in developing early warning indicators.SettingEHR data from the South London and Maudsley NHS Foundation Trust (SLaM) which provides secondary mental healthcare for 1.8 million people living in four South London boroughs.ObjectivesTo determine whether the time window proximal to a hospitalised suicide attempt can be discriminated from a distal period of lower risk by analysing the documentation and mental health clinical free text data from EHRs and (i) investigate whether the rate at which EHR documents are recorded per patient is associated with a suicide attempt; (ii) compare document-level word usage between documents proximal and distal to a suicide attempt; and (iii) compare n-gram frequency related to third-person pronoun use proximal and distal to a suicide attempt using machine learning.MethodsThe Clinical Record Interactive Search (CRIS) system allowed access to de-identified information from the EHRs. CRIS has been linked with Hospital Episode Statistics (HES) data for Admitted Patient Care. We analysed document and event data for patients who had at some point between 1 April 2006 and 31 March 2013 been hospitalised with a HES ICD-10 code related to attempted suicide (X60–X84; Y10–Y34; Y87.0/Y87.2).Findingsn = 8,247 patients were identified to have made a hospitalised suicide attempt. Of these, n = 3,167 (39.8%) of patients had at least one document available in their EHR prior to their first suicide attempt. N = 1,424 (45.0%) of these patients had been “monitored” by mental healthcare services in the past 30 days. From 60 days prior to a first suicide attempt, there was a rapid increase in the monitoring level (document recording of the past 30 days) increasing from 35.1 to 45.0%. Documents containing words related to prescribed medications/drugs/overdose/poisoning/addiction had the highest odds of being a risk indicator used proximal to a suicide attempt (OR 1.88; precision 0.91 and recall 0.93), and documents with words citing a care plan were associated with the lowest risk for a suicide attempt (OR 0.22; precision 1.00 and recall 1.00). Function words, word sequence, and pronouns were most common in all three representations (uni-, bi-, and tri-gram).ConclusionEHR documentation frequency and language use can be used to distinguish periods distal from and proximal to a suicide attempt. However, in our study 55.0% of patients with documentation, prior to their first suicide attempt, did not have a record in the preceding 30 days, meaning that there are a high number who are not seen by services at their most vulnerable point.</p

FigShare

Recommended from our members

Understanding views around the creation of a consented, donated databank of clinical free text to develop and train natural language processing models for research: focus group interviews with stakeholders

Author: Angus Roberts (162102)
Anoop D Shah (4335883)
Elizabeth Ford (4461778)
Goran Nenadic (29385)
Kerina Jones (663671)
Natalie K Fitzpatrick (16301378)
Richard Dobson (133966)
Publication venue
Publication date: 03/05/2023
Field of study

Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery

Sussex Research Online

Association results of the SNPs included in the GWAS of oral cancer (by p-values), pair-wise r2 estimates with rs991316, and recombination rates, for SNPs in the ADH gene region on 4q23.

Author: Ana Menezes (162276)
Angus Roberts (162102)
Antonio Agudo (115099)
Ariana Znaor (162203)
Claire M. Healy (162207)
Cristina Canova (162189)
Dan Chen (32468)
David I. Conway (162201)
David Zaridze (162220)
Diana Zelenika (69803)
Eleonóra Fabiánová (162234)
Graham Byrnes (66091)
Hamish Cunningham (162314)
Ioan Nicolae Mates (162237)
Ivana Holcátová (162146)
James D. Mckay (162321)
Jolanta Lissowska (89931)
Jon Wakefield (162318)
Jose Eluf-Neto (162283)
Kristina Kjaerheim (162159)
Lars Vatten (148142)
Lenka Foretova (162255)
Leticia Fernandez Garrote (162298)
Lorenzo Richiardi (158819)
Luigi Barzan (162183)
Manon Delahaye-Sourdeix (162111)
Maria Paula Curado (162267)
Mark A. Greenwood (162131)
Mark Lathrop (17971)
Mattias Johansson (158504)
Nalin S. Thakker (162194)
Neonilia Szeszenia-Dabrowska (162227)
Niraj Aswani (162121)
Pagona Lagiou (151309)
Paolo Boffetta (63968)
Paul Brennan (63967)
Peter Thomson (162310)
Pilar Galan (126838)
Renato Talamini (56133)
Rolando Herrero (106155)
Sergio Koifman (162270)
Silvia Franceschi (162292)
Simone Benhamou (111834)
Stefania Boccia (162303)
Tatiana V. Macfarlane (162176)
Victor Wünsch-Filho (113615)
Vladimir Bencko (162246)
Vladimir Janout (162261)
Wolfgang Ahrens (162215)
Xavier Castellsagué (162168)
Yaoyong Li (162109)
Publication venue
Publication date
Field of study

P-values indicating the strength of association for each SNP in the GWAS with oral cancer are shown on the −log10 scale (left Y-axis), against their positions on chromosome 4 (Build 36.3). The color of each point and SNP represent the degree of linkage disequilibrium (r2) with rs991316 according to HapMap phase II CEU data. Highlighted in the figure are rs1229984, rs1789924 and rs971074, which have been reported to be associated with UADT cancers previously, as well as the rs991316 SNP which was discovered to be associated specifically with oral cancer in the current study. rs1229984 was not genotyped, nor tagged by a proxy variant on the HumanHap300 BeadChip but was genotyped by Taqman assay in the same samples from Central Europe and ARCAGE studies as included in the discovery phase of current GWAS, and r2 between rs1229984 and rs991316 was estimated in the 3,513 controls from Central European and ARCAGE studies. Recombination rates across the region are shown by the light blue line plotted against the right y axis. Genes in the region are represented with arrow heads indicating the direction of transcription.</p

FigShare