Search CORE

9 research outputs found

Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

Author: A Bannach-Brown
A O’Mara-Eves
AM Cohen
AM Cohen
BC Wallace
BC Wallace
BE Howard
CD Manning
ES Sena
F Pedregosa
G Kontonatsios
G Tsafnat
GV Cormack
I Shemilt
J Friedman
J Rathbone
J Thomas
JD Rodriguez
JH Elliott
KF Kerr
L Bornmann
M Miwa
MJ Pencina
P Przybyła
R Borah
RB Vries de
RG Newcombe
Y Tsuruoka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

BACKGROUND: Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review. METHODS: We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis). RESULTS: ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm. CONCLUSIONS: This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology

Bond University Research Portal

Crossref

Directory of Open Access Journals

UCL Discovery

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

FigShare

Text Mining the History of Medicine

Author: A Henriksson
AR Aronson
C Mihăilă
Carsten Timmermann
D Lopresti
D McClosky
Elizabeth Toon
G Hripcsak
G Schneider
Georgios Kontonatsios
H Moen
H Suominen
J Cohen
J-D Kim
Jacob Carter
John McNaught
JR Firth
K Bontcheva
KB Wagholikar
L Kelly
LM Schriml
Luis M. Rocha
M Miwa
M Miwa
M Ruiz-Casado
M Worboys
MA Hearst
Michael Worboys
N Alnazzawi
O Bodenreider
P Murrieta-Flores
P Thompson
Paul Thompson
R Prasad
RI Dogan
Riza Theresa Batista-Navarro
S Jonnalagadda
S Pyysalo
S Zhang
Sophia Ananiadou
T Hitchcock
TH Tanner
Y Tsuruoka
Y Tsuruoka
Y Tsuruoka
Y Wang
Z Liu
ZS Harris
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/01/2016
Field of study

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository

Constructing a biodiversity terminological inventory.

Author: A Cockburn
A Henriksson
Axel J. Soto
B Boyle
C Carpineto
CD Manning
CS Parr
CW Dunnett
D Koning
D Patterson
E Pafilis
EV Berghe
G Miller
Georgios Kontonatsios
GF Guala
GH Golub
J Bobadilla
J Mitchell
JZ Wang
K Erk
K Frantzi
LM Akella
M Ashburner
M Batet
M Gerner
M Strube
N Gwinn
N Naderi
Nhung T. H. Nguyen
O Bodenreider
O Levy
P Thompson
P Thompson
PD Cantino
PD Turney
PR Leary
R Pivovarov
Riza Batista-Navarro
RL Pyle
Robert Guralnick
S Clark
S Harispe
Sophia Ananiadou
T Mikolov
T Pedersen
T Rees
WE Winkler
WN Lee
WW Cohen
X Wang
Y Bengio
Y Roskov
Y Sasaki
Y Sasaki
Y Sasaki
Y Tsuruoka
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2017
Field of study

The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

FigShare

Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Author: Ananiadou S
Kontonatsios G
Korkontzelos I
Tsujii J
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref

The University of Manchester - Institutional Repository

Using Random Forest to recognise translation equivalents of biomedical terms across languages

Author: Ananiadou S
Kontonatsios G
Korkontzelos I
Tsujii J
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2013
Field of study

The University of Manchester - Institutional Repository

Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA

Author: Ananiadou S
Batista-Navarro Riza Theresa
Kontonatsios G
Korkontzelos I
Mihaila Claudiu
Thompson P
Publication venue
Publication date: 01/08/2013
Field of study

There exist various different discourse annotation schemes that vary both in the perspectives of discourse structure considered and the granularity of textual units that are annotated. Comparison and integration of multiple schemes have the potential to provide enhanced information. However, the differing formats of corpora and tools that contain or produce such schemes can be a barrier to their integration. U-Compare is a graphical, UIMA-based workflow construction platform for combining interoperable natural language processing (NLP) resources, without the need for programming skills. In this paper, we present an extension of U-Compare that allows the easy comparison, integration and visualisation of resources that contain or output annotations based on multiple discourse annotation schemes. The extension works by allowing the construction of parallel subworkflows for each scheme within a single U-Compare workflow. The different types of discourse annotations produced by each sub-workflow can be either merged or visualised side-by-side for comparison. We demonstrate this new functionality by using it to compare annotations belonging to two different approaches to discourse analysis, namely discourse relations and functional discourse annotations. Integrating these different annotation types within an interoperable environment allows us to study the correlations between different types of discourse and report on the new insights that this allows us to discover. The authors have contributed equally to the development of this work and production of the manuscript.

CiteSeerX

The University of Manchester - Institutional Repository

"Mining events from the literature for bioinformatics applications" by S. Ananiadou, P. Thompson, and R. Nawaz; with Martin Vesely as coordinator

Author: BJÖRNE J.
BURGUN A.
CHUN H.W.
DEGTYARENKO K.
DOLBEY A.
HIROHATA K.
KONTONATSIOS G.
MIWA M.
NAWAZ R.
NAWAZ R.
NEDELLEC C.
PARK J.C.
Paul Thompson
Raheel Nawaz
SAGAE K.
SCHWARTZ A.
Sophia Ananiadou
STENETORP P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A deep learning approach to bilingual lexicon induction in the biomedical domain

Author: A Haghighi
A Irvine
A Irvine
A Laroche
A Lazaridou
A Søgaard
A Tamura
C Wang
C-T Tsai
D Bollegala
DP Kingma
E Prochasson
FJ Och
G Heyman
G Kontonatsios
G Kontonatsios
G Navarro
G-A Levow
Geert Heyman
GS Mann
I Sutskever
I Vulić
I Vulić
I Vulić
I Vulić
I Vulić
I Vulić
ID Melamed
Ivan Vulić
J Coulmance
J Hellrich
K Wołk
KM Hermann
L Duong
M Baroni
M Schuster
Marie-Francine Moens
N Srivastava
R Navigli
R Řehůřek
S Gouws
S Hochreiter
S Upadhyay
SAP Chandar
T Mikolov
V Claveau
V Lavrenko
WY Zou
X Liu
Y LeCun
Y Xu
Y Xu
Z Afzal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improving reference prioritisation with PICO recognition

Author: A Graves
A Viterbi
A. M. Cohen
AD Oxman
AJ Soto
AM Cohen
AM Cohen
AM Cohen
AM Cohen
AM Cohen
AM Cohen
AS Kosinski
Austin J. Brockmeier
B De Bruijn
BC Wallace
BC Wallace
C Kelly
C Schardt
D Demner-Fushman
D Demner-Fushman
D Nadeau
DDA Bui
DM Blei
E Beller
F Boudin
Florian Boudin
G Karystianis
G Karystianis
G Kontonatsios
G Salton
G Tsafnat
G Tsafnat
GY Chung
GY-C Chung
Heather W. Brown
I Sim
Iain J. Marshall
Ian Shemilt
IE Allen
IJ Marshall
IJ Marshall
J Snoek
J Thomas
J Zhao
J Zhao
J-D Kim
JD Lafferty
JP Higgins
K Hara
K Hashimoto
K Small
LA Millard
M Chowdhury
M Dawes
M Khabsa
Makoto Miwa
Meizhi Ju
MJ Hansen
N Srivastava
O Bodenreider
O Frunza
O Frunza
P Stenetorp
P Timsina
Piotr Przybyła
R Collobert
R Leaman
R Summerscales
R Sætre
R Xu
S Choi
S Hochreiter
S Jonnalagadda
S Kiritchenko
S Matwin
SN Kim
Sophia Ananiadou
SR Dalal
SR Jonnalagadda
T Bekhuis
T Bekhuis
T Bekhuis
T Mikolov
W Hsu
W Leisenring
WS Richardson
X Ji
Xiaonan Ji
Y Aphinyanaphongs
Y Bengio
Zachary C. Lipton
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref