Search CORE

15,227 research outputs found

Text mining of biomedical literature: discovering new knowledge

Author: Chakrabarty Sumana
Goswami Saikat
Mazumder Sourav
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 16/01/2021
Field of study

Biomedical literature is increasing day by day. The present scenario shows that the volume of literature regarding “coronavirus” has expanded at a high rate. In this study, text mining technique has been employed to discover something new from the published literature. The main objectives of this study are to show the growth of literature (Jan-Jun, 2020), extract document section, identify latent topics, find the most frequent word, represent the bag of words, and the hierarchical clustering. We have collected 16500 documents from PubMed. This study finds most number of documents (11499) belong to May and June. We explore “betacoronavirus” as the leading document section (3837); “covid” (29890) as the most frequent word in the abstracts; and positive-negative weights of topics. Further, we measure the term frequency (TF) of a document title in the bag of words model. Then we compute a hierarchical clustering of document titles. It reveals that the lowest distance the selected cluster (C133) is 0.30. We also have made a discussion over future prospects and mentioned that this paper can be useful to researchers and library professionals for knowledge management

DigitalCommons@University of Nebraska

Clustering cliques for graph-based summarization of the biomedical research literature

Author: A Naud
A Nenkova
A Ozgür
A Pons-Porrata
AR Aronson
AT McCray
AT McCray
Bartlomiej Wilkowski
C Wartena
Dongwook Shin
F Lerch
G Erkan
G Liu
GC Stein
H Kilicoglu
H Kilicoglu
H Yu
H Zhang
Han Zhang
I Mani
I Yoo
J Ah-Pine
J Goodwin
J Yang
JB Kruskal
K Sparck Jones
KW Boyack
L Smith
LH Reeve
LH Reeve
M Bundschus
M Fiszman
M Fiszman
M Kan
M Lee
Marcelo Fiszman
MG Everett
MJ Norusis
O Bodenreider
P Langfelder
P Tan
PJ Rousseeuw
R Mihalcea
SP Borgatti
T Matsunage
TC Rindflesch
TC Rindflesch
Thomas C Rindflesch
V an der Spek P Klusener S
V Batagelj
VD Blondel
X Liu
X Zhang
Y Yamamoto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively

Crossref

Springer - Publisher Connector

PubMed Central

Online Research Database In Technology