Search CORE

UCL Discovery

Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter

Author: Arachi Diana
Coiera Enrico
Dunn Adam G.
Ong Mei-Sing
Tsafnat Guy
Zhou Xujuan
Publication venue: 'IOS Press'
Publication date: 01/01/2015
Field of study

The manner in which people preferentially interact with others like themselves suggests that information about social connections may be useful in the surveillance of opinions for public health purposes. We examined if social connection information from tweets about human papillomavirus (HPV) vaccines could be used to train classifiers that identify antivaccine opinions. From 42,533 tweets posted between October 2013 and March 2014, 2,098 were sampled at random and two investigators independently identified anti-vaccine opinions. Machine learning methods were used to train classifiers using the first three months of data, including content (8,261 text fragments) and social connections (10,758 relationships). Connection-based classifiers performed similarly to content-based classifiers on the first three months of training data, and performed more consistently than content-based classifiers on test data from the subsequent three months. The most accurate classifier achieved an accuracy of 88.6% on the test data set, and used only social connection features. Information about how people are connected, rather than what they write, may be useful for improving public health surveillance methods on Twitter

University of Southern Queensland ePrints

Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR)

Author: Adams Clive E.
Beller Elaine
Clark Justin
Diehl Heinz
Glasziou Paul
ICASR Group
Lund Hans
Ouzzani Mourad
Robinson Karen
Thayer Kristina
Thomas James
Tsafnat Guy
Turner Tari
Xia Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Systematic reviews (SR) are vital to health care, but have become complicated and time-consuming, due to the rapid expansion of evidence to be synthesised. Fortunately, many tasks of systematic reviews have the potential to be automated or may be assisted by automation. Recent advances in natural language processing, text mining and machine learning have produced new algorithms that can accurately mimic human endeavour in systematic review activity, faster and more cheaply. Automation tools need to be able to work together, to exchange data and results. Therefore, we initiated the International Collaboration for the Automation of Systematic Reviews (ICASR), to successfully put all the parts of automation of systematic review production together. The first meeting was held in Vienna in October 2015. We established a set of principles to enable tools to be developed and integrated into toolkits. This paper sets out the principles devised at that meeting, which cover the need for improvement in efficiency of SR tasks, automation across the spectrum of SR tasks, continuous improvement, adherence to high quality standards, flexibility of use and combining components, the need for a collaboration and varied skills, the desire for open source, shared code and evaluation, and a requirement for replicability through rigorous and open evaluation. Automation has a great potential to improve the speed of systematic reviews. Considerable work is already being done on many of the steps involved in a review. The ‘Vienna Principles’ set out in this paper aim to guide a more coordinated effort which will allow the integration of work by separate teams and build on the experience, code and evaluations done by the many teams working across the globe

Nottingham ePrints

Bond University Research Portal

Nottingham eTheses

Repository@Nottingham

NORA - Norwegian Open Research Archives

Monash University Research Portal

The implications of biomarker evidence for systematic reviews

Author: AJ Atkinson
AJ Butte
AM Cohen
AS Ptolemy
D Moher
DL Sackett
GC Siontis
Guy Tsafnat
H Bastian
J de Leon
J Ioannidis
JPA Ioannidis
JW Lee
Miew Keen Choong
MJ Khoury
MK Choong
NS Sung
OA Panagiotou
P de Solla DJ
RC Bast Jr
W Burke
X Bosch-Capblanch
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

Author: A Moura
ACE Darling
AL Delcher
AL Delcher
CJ van Rijsbergen
D Frishman
DA Rowe-Magnus
DB Searls
E Rivas
Enrico Coiera
F Baquero
F Meyer
F Meyer
Guy Tsafnat
H Quesneville
HW Stokes
HW Stokes
IT Paulsen
J Fleiss
J Landis
Jaron Schaeffer
Jon R Iredell
K Rutherford
L Stein
M Ashburner
M Kanehisa
MA Andrade
MJ Joss
R Overbeek
RM Hall
RS Levings
S Ji
S Leung
Sally R Partridge
SF Altschul
SR Partridge
U Bohnebeck
WR Pearson
Y Boucher
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. Results We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). Conclusion Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p

Springer - Publisher Connector

PubMed Central

BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

Author: A Korhonen
A Koussounadis
C Perez-Iratxeta
C Perez-Iratxeta
CB Giles
D Fourches
DR Swanson
EA Adie
EA Adie
EC Fieller
F Hammann
Frank PY Lin
FS Turner
GR Grimes
Guy Tsafnat
H Gurulingappa
J Freudenberg
JA Hanley
KJ Gaulton
L Màrquez
M Hall
M Krallinger
Matthew P Doogue
MF Porter
N López-Bigas
N Tiffin
P Srinivasan
RJ Epstein
S Aerts
S Raychaudhuri
S Raychaudhuri
S Rossi
S Tatar
S Yu
Stephen Anthony
Thomas M Polasek
TM Polasek
V Sintchenko
Y Garten
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.</p

Springer - Publisher Connector

PubMed Central

The Field Representation Language

Author: Tsafnat Guy
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The complexity of quantitative biomedical models, and the rate at which they are published, is increasing to a point where managing the information has become all but impossible without automation. International efforts are underway to standardise representation languages for a number of mathematical entities that represent a wide variety of physiological systems. This paper presents the Field Representation Language (FRL), a portable representation of values that change over space and/or time. FRL is an extensible mark-up language (XML) derivative with support for large numeric data sets in Hierarchical Data Format version 5 (HDF5). Components of FRL can be reused through unified resource identifiers (URI) that point to external resources such as custom basis functions, boundary geometries and numerical data. To demonstrate the use of FRL as an interchange we present three models that study hyperthermia cancer treatment: a fractal model of liver tumour microvasculature; a probabilistic model simulating the deposition of magnetic microspheres throughout it; and a finite element model of hyperthermic treatment. The microsphere distribution field was used to compute the heat generation rate field around the tumour. We used FRL to convey results from the microsphere simulation to the treatment model. FRL facilitated the conversion of the coordinate systems and approximated the integral over regions of the microsphere deposition field.12 page(s

A Three-dimensional fractal model of tamour vasculature

Author: Lambert Tim D
Tsafnat Guy
Tsafnat Naomi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

We constructed a three-dimensional fractal model of the vascular network in a tumour periphery. We model the highly disorganised structure of the neoplastic vasculature by using a high degree of variation in segment properties such as length, diameter and branching angle. The overall appearance of the vascular tree is subjectively similar to that of the disorganised vascular network which encapsulates tumours. The fractal dimension of the model is within the range of clinically measured values.4 page(s