Search CORE

11,877 research outputs found

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

Using a neural network-based feature extraction method to facilitate citation screening for systematic reviews

Author: KONTONATSIOS GEORGIOS
KORKONTZELOS YANNIS
MATTHEW PETER
SPENCER SALLY
Publication venue: 'Elsevier BV'
Publication date: 01/07/2020
Field of study

Edge Hill University Research Information Repository

A semi-supervised approach using label propagation to support citation screening

Author: Allen
Austin J. Brockmeier
Bastian
Bekhuis
Bennett
Blei
Chapelle
Cohen
Cortes
Freund
Fukunaga
García Adeva
Georgios Kontonatsios
Hashimoto
Higgins
Howard
Jaidee
John McNaught
John Y. Goulermas
Krempl
Liu
Matwin
McGowan
Miwa
Ng
O’Mara-Eves
Paynter
Piotr Przybyła
Schapire
Shemilt
Sophia Ananiadou
Timsina
Tingting Mu
Tong
Tsafnat
van der Maaten
Wallace
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/08/2017
Field of study

University of Liverpool Repository

Crossref

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

The GUIDES checklist: development of a tool to improve the successful use of guideline-based computerised clinical decision support

Author: Aertgeerts Bert
Flottorp Signe
GUIDES expert panel
Kortteisto Tiina
Kunnamo Ilkka
Roshanov Pavel
Van de Velde Stijn
Vandvik Per Olav
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: Computerised decision support (CDS) based on trustworthy clinical guidelines is a key component of a learning healthcare system. Research shows that the effectiveness of CDS is mixed. Multifaceted context, system, recommendation and implementation factors may potentially affect the success of CDS interventions. This paper describes the development of a checklist that is intended to support professionals to implement CDS successfully. Methods: We developed the checklist through an iterative process that involved a systematic review of evidence and frameworks, a synthesis of the success factors identified in the review, feedback from an international expert panel that evaluated the checklist in relation to a list of desirable framework attributes, consultations with patients and healthcare consumers and pilot testing of the checklist. Results: We screened 5347 papers and selected 71 papers with relevant information on success factors for guideline-based CDS. From the selected papers, we developed a 16-factor checklist that is divided in four domains, i.e. the CDS context, content, system and implementation domains. The panel of experts evaluated the checklist positively as an instrument that could support people implementing guideline-based CDS across a wide range of settings globally. Patients and healthcare consumers identified guideline-based CDS as an important quality improvement intervention and perceived the GUIDES checklist as a suitable and useful strategy. Conclusions: The GUIDES checklist can support professionals in considering the factors that affect the success of CDS interventions. It may facilitate a deeper and more accurate understanding of the factors shaping CDS effectiveness. Relying on a structured approach may prevent that important factors are missed

Directory of Open Access Journals

EUR Research Repository

Enlighten

NORA - Norwegian Open Research Archives

FigShare

Topic detection using paragraph vectors to support active learning in systematic reviews

Author: Beahler
Blei
Chalmers
Cohen
Collobert
Dai
Dhillon
Fan
Fineout-Overholt
Fu
Georgios Kontonatsios
Gough
Hofmann
Kazuma Hashimoto
Le
Makoto Miwa
Mikolov
Miwa
O’Mara-Eves
Sophia Ananiadou
Turian
Turney
Wallace
Wallace
Wallach
Wang
Yu
Publication venue: 'Elsevier BV'
Publication date: 09/06/2016
Field of study

AbstractSystematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/

Elsevier - Publisher Connector

Crossref

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository

Using text mining for study identification in systematic reviews: a systematic review of current approaches

Author
Publication venue: BioMed Central
Publication date: 14/01/2015
Field of study

Springer - Publisher Connector

Prioritising references for systematic reviews with RobotAnalyst: A user study

Author: Brockmeier AJ
Fan RE
Fleuret F
Howard BE
Lefebvre C
Liu J
O'Mara‐Eves A
Ouzzani M
Rathbone J
Salton G
Shemilt I
Thomas J
Wallace BC
Publication venue: 'Wiley'
Publication date: 28/06/2018
Field of study

Screening references is a time-consuming step necessary for systematic reviews and guideline development. Previous studies have shown that human effort can be reduced by using machine learning software to prioritise large reference collections such that most of the relevant references are identified before screening is completed. We describe and evaluate RobotAnalyst, a Web-based software system that combines text-mining and machine learning algorithms for organising references by their content and actively prioritising them based on a relevancy classification model trained and updated throughout the process. We report an evaluation over 22 reference collections (most are related to public health topics) screened using RobotAnalyst with a total of 43 610 abstract-level decisions. The number of references that needed to be screened to identify 95% of the abstract-level inclusions for the evidence review was reduced on 19 of the 22 collections. Significant gains over random sampling were achieved for all reviews conducted with active prioritisation, as compared with only two of five when prioritisation was not used. RobotAnalyst's descriptive clustering and topic modelling functionalities were also evaluated by public health analysts. Descriptive clustering provided more coherent organisation than topic modelling, and the content of the clusters was apparent to the users across a varying number of clusters. This is the first large-scale study using technology-assisted screening to perform new reviews, and the positive results provide empirical evidence that RobotAnalyst can accelerate the identification of relevant studies. The results also highlight the issue of user complacency and the need for a stopping criterion to realise the work savings

Crossref

Serveur académique lausannois

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

Aligning Pathology Assessment in a Learner-Centered Undergraduate Medical Curriculum

Author: Doshi Neelam
Tepper Carmel
Wright Robert Gordon
Publication venue: 'Biomedical Research Network, LLC'
Publication date: 12/10/2017
Field of study

Bond University Research Portal

Supporting systematic reviews using LDA-based document representations

Author: AM Cohen
AM Cohen
BC Wallace
BC Wallace
C Counsell
CC Chang
D Demner-Fushman
DM Blei
E Linstead
F Boudin
FR Octaviano
G Maskeri
J García Adeva
K Frantzi
K Henderson
K Romero Felizardo
L Hunter
M Barza
M Fiszman
M Miwa
MW Berry
O Frunza
R Akbani
RA Redner
S Ananiadou
S Arora
S Jonnalagadda
S Kotsiantis
S Matwin
SK Lukins
T Bekhuis
T Bekhuis
T Bekhuis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository