Search CORE

4 research outputs found

Word add-in for ontology recognition: semantic enrichment of scientific literature

Abstract Background In the current era of scientific research, efficient communication of information is paramount. As such, the nature of scholarly and scientific communication is changing; cyberinfrastructure is now absolutely necessary and new media are allowing information and knowledge to be more interactive and immediate. One approach to making knowledge more accessible is the addition of machine-readable semantic data to scholarly articles. Results The Word add-in presented here will assist authors in this effort by automatically recognizing and highlighting words or phrases that are likely information-rich, allowing authors to associate semantic data with those words or phrases, and to embed that data in the document as XML. The add-in and source code are publicly available at <url>http://www.codeplex.com/UCSDBioLit</url>. Conclusions The Word add-in for ontology term recognition makes it possible for an author to add semantic data to a document as it is being written and it encodes these data using XML tags that are effectively a standard in life sciences literature. Allowing authors to mark-up their own work will help increase the amount and quality of machine-readable literature metadata.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Author: AGK Janacek
André Skupin
BC Vanteru
Bob Schijvenaars
Colin Allen
David Newman
DJ Newman
DK Harman
DM Blei
EM Voorhees
EP Jiang
F Janssens
G Gorrell
G Salton
GL Poulter
GR Hjaltason
HM Müller
J Lewis
J Lin
J Lin
Joseph R. Biberstine
K Börner
K Järvelin
K Sparck Jones
K Sparck Jones
Katy Börner
Kevin W. Boyack
KW Boyack
KW Boyack
KW Boyack
MA Hearst
MD Cao
Michael Patek
MW Berry
N Jardine
Nianli Ma
NJ Belkin
P Ahlgren
P Ahlgren
P Calado
P Castells
R Kassab
R Klavans
Richard Klavans
Russell J. Duhon
S Deerwester
S Martin
SE Robertson
T Couto
T Hofmann
T Kohonen
T Kohonen
T Theodosiou
TG Kolda
TK Landauer
WS Cooper
Y Aphinyanaphongs
Y Yamamoto
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

Public Library of Science (PLOS)

Crossref

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The proportion of cancer-related entries in PubMed has increased considerably; is cancer truly "The Emperor of All Maladies"?

Author: A Abbott
A Doms
A Gompel
A Gupta
A Molarius
AA Hedley
AB Mitchell
AL MacLean
AP Czernilofsky
AP Polednak
B Aschebrook-Kilfoy
B Pérez-Gómez
BC Vanteru
C Dehlendorff
C Erika Hayden
C Grimmett
CC Murphy
CI Szabo
CJ Evans
CM Groh
CM Morel
Constantino Carlos Reyes-Aldasoro
D Barbolosi
D Lyrdal
D Stehelin
DD Richman
DE Riesenberg
DM Pigott
DR Youlden
E Crocetti
ES Ford
FJ González-Gómez
G Yamey
GG Giles
GG Powathil
Giuseppe Novelli
GP Figueredo
H van Weert
HL Crowell
HM Byrne
J Sanz
JF Gusella
JG Scott
JL Anderson
JM Ramos
JR Fitchett
JR Jenkins
JR Starke
K Abbasi
K Annertz
K Marsh
M Søgaard
M Worobey
MAJ Chaplain
ME Anders
MI Harris
MI Harris
MJ Haley
MJ Plank
MM Le Beau
N Hou
N Pandeya
PA McAuley
R Jacobsen
R Lozano
RE Gerszten
RG Kyle
RK Saiki
RP Dikshit
S Anderson
SK Kershaw
SR Weiss
T Ruzicka
T Theodosiou
TR Rebbeck
Y Akachi
Y Guo
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/03/2017
Field of study

In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain "DNA", "Computational" or "Mathematical" have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK's Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material

Public Library of Science (PLOS)

City Research Online

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Recent highlights of Chinese medicine for advanced lung cancer

Author: A Doms
A Flower
BC Vanteru
D Fajardo-Ortiz
D Gompelmann
F Perlikos
H Chen
H Huang
HS Lin
J Dong
J Nikles
J Ning
JH Cheng
JH Tian
K Takeda
L Du
L Gao
L Guo
L Shamseer
L Zhang
L Zhou
LA Torre
M McCulloch
MH Bilsky
MK Garcia
NB Gabler
NZ Xue
Ping-ping Li
R Kin
R Liu
R Xu
RA Gatenby
SF Cao
Shu-yan Han
SL Wood
T Kamei
T Sun
X Chen
XB Yang
Xi-ran He
XR He
Y Ishiura
YL Wu
YZ Chen
YZ Chen
ZL Liu
ZY Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref