Search CORE

9 research outputs found

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Author: AGK Janacek
André Skupin
BC Vanteru
Bob Schijvenaars
Colin Allen
David Newman
DJ Newman
DK Harman
DM Blei
EM Voorhees
EP Jiang
F Janssens
G Gorrell
G Salton
GL Poulter
GR Hjaltason
HM Müller
J Lewis
J Lin
J Lin
Joseph R. Biberstine
K Börner
K Järvelin
K Sparck Jones
K Sparck Jones
Katy Börner
Kevin W. Boyack
KW Boyack
KW Boyack
KW Boyack
MA Hearst
MD Cao
Michael Patek
MW Berry
N Jardine
Nianli Ma
NJ Belkin
P Ahlgren
P Ahlgren
P Calado
P Castells
R Kassab
R Klavans
Richard Klavans
Russell J. Duhon
S Deerwester
S Martin
SE Robertson
T Couto
T Hofmann
T Kohonen
T Kohonen
T Theodosiou
TG Kolda
TK Landauer
WS Cooper
Y Aphinyanaphongs
Y Yamamoto
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

Public Library of Science (PLOS)

Crossref

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

113 Years of Physical Review: Using Flow Maps to Show Temporal and Topical Citation Patterns

Author: Bruce W. Herr Ii
Elisha F. Hardy
Katy Börner
Russell J. Duhon
Shashikant Penumarthy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We visualize 113 years of bibliographic data from the American Physical Society. The 389,899 documents are laid out in a two dimensional time-topic reference system. The citations from 2005 papers are overlaid as flow maps from each topic to the papers referenced by papers in the topic making intercitation patterns between topic areas visible. Paper locations of Nobel Prize predictions and winners are marked. Finally, though not possible to reproduce here, the visualization was rendered to, and is best viewed on, a 24 ” x 30 ” canvas at 300 dots per inch (DPI). Keywords---network analysis, domain visualization, physical review 1

CiteSeerX

Crossref

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text- Based Similarity Approaches

Author: André Skupin
Bob Schijvenaars
David Newman
Joseph R
Katy Börner
Kevin W. Boyack
Michael Patek
Nianli Ma
Richard Klavans
Russell J. Duhon
Publication venue
Publication date: 08/12/2014
Field of study

Background: We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology: We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compare

CiteSeerX

Social, economic, and environmental factors influencing the basic reproduction number of COVID-19 across countries

Author: A Adhikari
A Frontera
A Raza
AR Tuite
B Rader
B Ridenhour
B Wang
BJ Cowling
BZ Diop
C Gargiulo
C Scarpone
CIA Oronce
CMMID COVID-19 working group
D Fattorini
D Ghosh
DH Morris
E Petersen
H Li
J Demongeot
J Duhon
J Hilton
J Ma
J Ran
J Sooknanan
J Wallinga
J-T Wei
J. Shaw
JB Dowd
JD Kong
JJV Bavel
K Azuma
K Wu
L Matrajt
LD Martins
LYK Nakada
M Krkošek
M Park
MFF Sobral
MM Sajadi
MS Chan
MS Islam
N Islam
NG Davies
O Singh
P Pequeno
Q Li
R Chaudhry
R Kreutz
R Niehus
S Comunian
S Copiello
S Gangemi
S Lolli
S Sanche
S Vosoughi
S Zhao
SE Haque
SM Kissler
TK Boehmer
TW Russell
TW Russell
X He
Y Jiang
Y Li
Y Yao
Z Du
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

The Effect of Machiavellism on Job Attitude: Organizational Justice as a Moderator

Author: Adams J. Stacy
Andrew Martha C
Austin Elizabeth J
Austin William
Barrett Lisa Feldman
Bruk-Lee Valentina
Byrne Zinta S
Cohen-Charash Y
Colquitt Jason A
Cook Gloria Harrell
Cropanzano Russell
Cropanzano Russell
Dahling J. Jason
Deluga Ronald J
Dingler-Duhon Melissa
Drory Amos
Drory Amos
Edwards Jeffrey R
Ferris Gerald. R
Ferris Gerald. R
Ferris Gerald. R
Ferris Gerald. R
Folger Robert
Gable Myron
Gemmill Gary R
Gonzalez-Roma Vincente
Goodboy Alan K
Gordon Micahel E
Greenberg Jerald
Greenberg Jerald
Gunnthorsdottir Anna
Hackman J. Richard
Hanell-Cook Gloria
Harrell W. Andrew
Heisler W. J
Hollon Charles J
Howell Jon P
Hu Li
Hunt Shelby D
Hurley Susan
Johnson Palmer O
Johnson Palmer O
Judge Timothy Amir
Judge Timothy Amir
Kacmar K. Michele
Kerr Steven
Kessler Stacey R
Kish-Gephart J.
Miller Brian K
Molm Linda
Naumann Stefanie E
O'Boyle Ernest H
Podsakoff Philip Mackenzie
Podsakoff Philip Mackenzie
Potthoff Richard F
Rosen Christopher C
Sakalaki Maria
Schaufeli Wilmar. B
Sullivan Sherny E
Sweeney Paul D
Tabibnia Golnaz
Topol Martin T
Valle Matthew
Van Yperen Nico W
Vernon Philip A
Watson David
Wilson David. Sloan
Witt L. A
Zettler Ingo
김동조
김영호
박지원
윤영일
이종원
이환범
임유석
임정우
주재진
홍세희
Publication venue: 'Social Science Research Institute - Locality and Globality - Korean Journal of Social Sciences'
Publication date
Field of study

Crossref

Public and private mechanisms of life extension in Caenorhabditis elegans

Author: A Antebi
A Berdichevsky
A Bokov
A Brooks
A Butov
A Dillin
A Horst Van Der
A Mukhopadhyay
A Wong
AA Khazaeli
AI Michalski
AI Yashin
AI Yashin
AK Hihi
AL Hsu
AM Burnell
AM Giglio
AZ Reznick
AZ Reznick
B Gerisch
B Gerisch
B Hamilton
B Lakowski
B Lakowski
B Meissner
B Rogina
BM Zuckerman
BN Marbois
BO Davis
BP Braeckman
BP Braeckman
BP Braeckman
BP Braeckman
C Kenyon
C Kenyon
CA Finlayson
CA Wolkow
CA Wolkow
CA Wolkow
CE Finch
CM Cahill
CT Murphy
D Barsyte
D Garigan
D Gems
D Gems
D Gems
D Heemst Van
D Nakai
DA Birnby
DA Sinclair
DA Sinclair
DB Friedman
DB Friedman
DG Hardie
DJ Clancy
DL Motola
DL Riddle
DL Riddle
DS Hwangbo
DW Nelson
E Hansen
E Le Bourg
E Pennisi
EA Malone
EB Gil
EB Kayser
EB Kayser
EB Kayser
EJ Calabrese
EJ Masoro
EL Arkblad
F Levavasseur
FI Pellerone
FM Gregoire
G Barja
G Walker
GA Walker
GA Walker
GI Patterson
GI Patterson
GJ Lithgow
GJ Lithgow
GJ Lithgow
GJ Sarkis
GL Anderson
GL Anderson
GV Clokey
H Daitoku
H Hosokawa
H Hsin
H Jiang
H Miyadera
H Miyadera
H Sheng
HA Tissenbaum
HK Sharma
HK Sharma
HK Sharma
HK Sharma
HK Sharma
HM Brown-Borg
HR Prasanna
J Alcedo
J Apfeld
J Apfeld
J Apfeld
J Cypser
J Feng
J Loeb
J Lund
J McElwee
J Popham
J Taub
J Wang
Jacques R. Vanfleteren
JB Dorman
JC Schafer
JF Morley
JG Wood
JH Thomas
JJ Ewbank
JJ Mcelwee
JJ Mcelwee
JJ Vowels
JN Sampayo
JN Sampayo
JP Rouault
JR Berman
JR Cypser
JR Cypser
JR Vanfleteren
JR Vanfleteren
JR Vanfleteren
JR Vanfleteren
JR Vanfleteren
JW Golden
JW Golden
JW Golden
JW Golden
JW Vaupel
JZ Morris
K Flurkey
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Houthoofd
K Jia
K Jia
K Lin
K Lin
K Nehrke
KD Kimura
KL Guan
Koen Houthoofd
KT Coschigano
KT Howitz
KZ Pan
L Duret
LA Herndon
M Ailion
M Ailion
M Ailion
M Baudry
M Bluher
M Boehm
M Fujii
M Hansen
M Hertweck
M Holzenberger
M Kaeberlein
M Kaeberlein
M Kaeberlein
M Keaney
M Keaney
M Klass
M Nanji
M Rothstein
M Rothstein
M Tatar
MA Bolanowski
MA Junger
MC Haigis
ME Giannakou
MJ Kisiel
MJ Kisiel
MJ Munoz
MP Giglio
MP Tu
MR Klass
MR Klass
MT Borra
N Arantes-Oliveira
N Arantes-Oliveira
N Hay
N Ishii
N Ishii
N Kimura
N Libina
N Senoo-Matsuda
N Suzuki
N Ventura
NA Croll
OI Petriv
P Babar
P Narbonne
P Stenmark
PL Larsen
PL Larsen
PL Larsen
PY Jeong
Q Chen
QX Hua
R Branicky
R Hosono
R Hosono
RC Cassada
RL Foll
RL Russell
RYN Lee
S Asaumi
S Felkai
S Hekimi
S Himmelhoch
S Himmelhoch
S Melov
S Melov
S Melov
S Miwa
S Murakami
S Ogg
S Ogg
S Oldham
S Ookuma
S Paradis
S Paradis
S Rea
S Wolff
SA Duhon
SB Pierce
SJ Broughton
SJ Holt
SJ Jones
SJ Lin
SK Gupta
SK Gupta
SL Rea
SS Lee
SS Lee
ST Henderson
ST Henderson
ST Lamitina
SW Oh
SW Oh
T Furuyama
T Hunter
T Inoue
T Jonassen
T Jonassen
T Kawano
T Vellai
Tatar m
TBL Kirkwood
TE Adams
TE Johnson
TE Johnson
TE Johnson
TE Johnson
TE Johnson
TE Johnson
TE Johnson
TJ Fabian
U Reiss
U Reiss
V Ambros
V Cherkasova
V Gorbunova
V Matyash
V Rottiers
VB Oriordan
VB Oriordan
VT Mihaylova
W Li
WA Voorhies Van
WA Voorhies Van
WA Voorhies Van
WB Wood
X Liu
Y Honda
Y Honda
Y Kushnareva
Y Wang
Y Wang
YJ Fei
YJ Fei
YJ Fei
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref