Search CORE

11 research outputs found

Supporting systematic reviews using LDA-based document representations

Author: AM Cohen
AM Cohen
BC Wallace
BC Wallace
C Counsell
CC Chang
D Demner-Fushman
DM Blei
E Linstead
F Boudin
FR Octaviano
G Maskeri
J García Adeva
K Frantzi
K Henderson
K Romero Felizardo
L Hunter
M Barza
M Fiszman
M Miwa
MW Berry
O Frunza
R Akbani
RA Redner
S Ananiadou
S Arora
S Jonnalagadda
S Kotsiantis
S Matwin
SK Lukins
T Bekhuis
T Bekhuis
T Bekhuis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

On the influence of program constructs on bug localization effectiveness

Author: CD Manning
IT Jolliffe
M Friendly
R Baeza-Yates
SK Lukins
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Eye movements in software traceability link recovery

Author: A De Lucia
A De Lucia
AT Duchowski
B Dit
B Dit
B Ramesh
Bonita Sharif
G Antoniol
G Scanniello
Huzefa Kagdi
John Meinken
S Deerwester
SK Lukins
Timothy Shaffer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

Author: A Barua
A Hotho
C Rosen
David Lo
DE Goldberg
DM Blei
J Sander
Jian-Ling Sun
SK Lukins
Xin Xia
Xin-Li Yang
XY Wang
Y Zhang
Zhi-Yuan Wan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Author: A Lemmens
A Zeller
AM Prasad
CD Manning
D Hovemeyer
David Lo
DM Blei
DS Broomhead
E Bauer
E Shihab
Ferdian Thung
LoD Lucia
MF Porter
S Heckman
SK Lukins
SW Thomas
Tien-Duy B. Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

Author: B Cleary
B Dit
B Dit
CD Manning
Chengxiang Zhai
D Poshyvanyk
D Poshyvanyk
DM Blei
G Capobianco
G Salton
H Kagdi
John Anvik
N Wilde
Ramin Shokripour
S Zamani
Sai Peck Lee
Sima Zamani
SK Lukins
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Link analysis algorithms for static concept location: an empirical assessment

Author: A Abadi
A De Lucia
A Marcus
A Marcus
A Panichella
AJ Ko
AN Oppenheim
Andrian Marcus
B Kitchenham
C McMillan
CB Seaman
CD Manning
D Poshyvanyk
D Poshyvanyk
Daniele Pascale
DIK Sjoberg
E Arisholm
F Ricca
FJ Shull
G Gay
G Salton
G Scanniello
G Scanniello
G Scanniello
Giuseppe Scanniello
J Carver
J Hannay
K Inoue
LC Briand
M Beard
M Ciolkowski
M Colosimo
M Eaddy
M Harman
M Petrenko
M Revelle
MP Robillard
N Cliff
N Gold
N Juristo
OJ Dunn
S Abrahão
S Brin
S Haiduc
S Shapiro
SC Deerwester
SK Lukins
SK Lukins
V Rajlich
VB Kampenes
W Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

EnTagRec ++: An enhanced tag recommendation system for software information sites

Author: A Joorabchi
A Panichella
Alexander Serebrenik
Bogdan Vasilescu
C Held
C Treude
CC Vogt
David Lo
F Crestani
F Wilcoxon
G Antoniol
G Capobianco
HU Asuncion
J Bergstra
JM Al-Kofahi
M-A Storey
N Ghamrawi
P Baldi
SA Golder
Shaowei Wang
SK Lukins
U Cress
X-Y Wang
Y Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A survey on the use of topic models when mining software repositories

Author: A Kuhn
A Marcus
A Qusef
Ahmed E. Hassan
AT Misirli
B Cleary
B Dit
BA Kitchenham
C Carpineto
CD Manning
CK Roy
CS Gall
CX Zhai
D Blei
D Pagano
D Poshyvanyk
D Poshyvanyk
DJ Bartholomew
DM Blei
DM Blei
DM Blei
DM Blei
E Linstead
F Rahman
G Anthes
G Bavota
G Canfora
G Salton
GA Miller
H Kagdi
I Jolliffe
J Brickey
K Barnard
LR Biggers
M Borg
M Lormans
M Porter
MM Lehman
MW Godfrey
P Comon
P Flaherty
PF Baldi
R Baeza-Yates
R Blumberg
R Tairas
RC de Boer
S Deerwester
S Kawaguchi
S Thomas
SK Lukins
SL Abebe
Stephen W. Thomas
T Hofmann
TL Griffiths
TL Griffiths
Tse-Hsun Chen
W Shang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Author: A Bagheri
A Daud
B Gretarsson
B Liu
C Li
C Li
C Vaduva
C Zirn
Chi Yuan
D Jiang
DA McFarland
DD Lewis
DM Blei
DQ Nguyen
E Sandhaus
F Xianghua
F Zhao
H Bisgin
H Tang
H Wang
H Wu
H-M Lu
Hamed Jelodar
J Choo
J Miao
J Philbin
J Wang
J Weng
J Zeng
J-F Yeh
J-T Chien
K Yoshii
K Yu
KE Levy
KW Prier
L Hou
L Zhang
Liang Zhao
M Everingham
M Kim
M Lienou
M Lui
M Paul
M Song
M Steyvers
M-C Yang
MH Alam
MJ Paul
MS Gerber
P Hu
P Srijith
R Cohen
R Yu
S Debortoli
S Qian
S Siersdorfer
S Sun
S Tan
S Zoghbi
SK Lukins
T Wang
T-H Chen
TL Griffiths
VC Cheng
W Xie
X Fu
X Fu
X Li
X Wang
X Yu
X Zheng
X-P Zhang
Xia Feng
Xiahui Jiang
Y Kim
Y Li
Y Liu
Y Rao
Y Rao
Y Ren
Y Zhang
Yanchao Li
Yongli Wang
Z Cheng
Z Huang
Z Li
Z Li
Z Li
Z Liu
Z Qin
Z Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

core

core