Search CORE

2,349 research outputs found

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Author: AGK Janacek
André Skupin
BC Vanteru
Bob Schijvenaars
Colin Allen
David Newman
DJ Newman
DK Harman
DM Blei
EM Voorhees
EP Jiang
F Janssens
G Gorrell
G Salton
GL Poulter
GR Hjaltason
HM Müller
J Lewis
J Lin
J Lin
Joseph R. Biberstine
K Börner
K Järvelin
K Sparck Jones
K Sparck Jones
Katy Börner
Kevin W. Boyack
KW Boyack
KW Boyack
KW Boyack
MA Hearst
MD Cao
Michael Patek
MW Berry
N Jardine
Nianli Ma
NJ Belkin
P Ahlgren
P Ahlgren
P Calado
P Castells
R Kassab
R Klavans
Richard Klavans
Russell J. Duhon
S Deerwester
S Martin
SE Robertson
T Couto
T Hofmann
T Kohonen
T Kohonen
T Theodosiou
TG Kolda
TK Landauer
WS Cooper
Y Aphinyanaphongs
Y Yamamoto
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

Public Library of Science (PLOS)

Crossref

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Supporting E-Health Information Seekers: From Simple Strategies to Knowledge-Based Methods

Author: Dahamna Badisse
Darmoni Stéfan J.
Soualmia Lina F.
Publication venue: 'IntechOpen'
Publication date: 12/09/2012
Field of study

IntechOpen

Investigating Citation Linkage as a Sentence Similarity Measurement Task using Deep Learning

Author: Singha Roy Sudipta
Publication venue: Scholarship@Western
Publication date: 26/03/2020
Field of study

Research publications reflect advancements in the corresponding research domain. In these research publications, scientists often use citations to bolster the presented research findings and portray the improvements that come with these findings, at the same time, to make the contents more understandable to the audience by navigating the flow of information. In the science domain, a citation refers to the document from where this information originates but doesn\u27t specify the text span that is actually being cited. A more precise reference would indicate the text being referenced. This thesis develops a framework which can create a linkage between the citing sentences from the ongoing research article and the related cited sentences from the corresponding referenced documents. This citation linkage problem has been modeled as a semantic relatedness task where given a citing sentence the framework pairs this citing sentence with each sentence from the reference document and then tries to determine which sentence pair is semantically similar and which pair is not. Construction of the citation linkage framework involves corpus creation and utilizing deep-learning models for semantic similarity measurement

Scholarship@Western

Biomedical informatics and translational medicine

Author: A Berlin
A Brazma
A Burgun
A Ebidia
A Ikekawa
A Kundaje
A Mangalampalli
A Ruttenberg
A Rzhetsky
A Wright
AH Peden
AJ Butte
AJ Butte
AJ Butte
AJ Cawsey
AK Smith
AM McDaniel
AS N
AX Garg
B Chaudhry
B Honigman
B Kaplan
B Louie
B Mollon
B Williams-Jones
BC Choi
BC Choi
BC Choi
BG Blobel
BJ Liu
BJ Liu
BL Humphreys
BM Costa
C Fomous
C Ohmann
CD Manning
CE Kahn Jr
CP Friedman
CS Ledbetter
D Detmer
D Johnston
D Jurafsky
D Lorence
D Rebholz-Schuhman
D Revere
D Short
DA Jordan
DA Lindberg
DB Keator
DC Balfour
DF Sittig
DJ Persell
DJ Severtson
DK Manley
DL Buckeridge
DL Heymann
DL Hunt
DL Rubin
DL Rubin
DM Bravata
DR Masys
DR Swanson
E Barclay
E Cadag
E Reiter
EA Zerhouni
EA Zerhouni
EG Poon
EH Shortliffe
EJ Hovenga
ER Weitzman
EV Bernstam
EV Bernstam
FT de Dombal
FT De Dombal
G Eysenbach
G Hripcsak
G Wade
GA Thorisson
GJ Downing
GO Barnett
GO Klein
GO Klein
GS Butler
GS Omenn
H Eriksson
H Muller
HJ Lee
HP Lehmann
HU Prokosch
Indra Neil Sarkar
IS Vizirianakis
J Allen
J Blake
J Cimino
J Lahteenmaki
J Lombardo
J Lyon
J Mantas
J Mykkanen
J Pathak
J Pearl
J Quackenbush
JA Osheroff
JD Halamka
JE Allen
JH van Bemmel
JJ Cimino
JJ Cimino
JK Iglehart
JM Marchibroda
JM Westfall
JS Brownstein
K Hayrinen
K Kawamoto
K Kawamoto
K Wasson
KA Kuhn
KB Cohen
KD Mandl
L Ohno-Machado
L Poissant
L Stein
LM Prevedello
M Dalal
M Fieschi
M Gerstein
M Musen
M Scherf
M Weeber
MA Harris
MA Hoffman
MA Musen
MD Kane
MF Collen
MF Collen
MJ Ball
MJ Ball
MJ Khoury
MS Siadaty
MS Watson
MY Galperin
MY Law
O Bodenreider
O Ratib
O Ratib
P Baxter
P De Clercq
P Durieux
P Jacquemart
P Mirhaji
PA Dang
PA Dang
PC Tang
PF Brennan
PG Shekelle
PH Gesteland
PJ Embi
PJ Embi
PL Reichertz
PM Kuzmak
PR Payne
PR Payne
PW O'Carroll
QT Zeng
R Feldman
R Haux
R Khorasani
R Kukafka
R Kukafka
R Mattheus
RA Greenes
RA Greenes
RA Greenes
RA Pagon
RB Altman
RL Arenson
RL Richesson
RO Duda
RS Dick
S Oster
S Xu
SB Johnson
SB King
SC Kirkwood
SF Altschul
SH Woolf
SM Huff
SM Maviglia
SM Meystre
SS Furuie
ST Rosenbloom
TH Payne
TK Houston
TR Frieden
U Rajcevic
U Sax
V Kashyap
V Maojo
V Maojo
VL Patel
W Clancey
W Hersh
W Hersh
W Hsu
WA Yasnoff
WD Bidgood Jr
WE Evans
WE Hammond
WE Hammond
WE Schreiber
WJ Bug
WR Hersh
WR Hersh
WR Hersh
WW Chapman
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Biomedical informatics involves a core set of methodologies that can provide a foundation for crossing the "translational barriers" associated with translational medicine. To this end, the fundamental aspects of biomedical informatics (e.g., bioinformatics, imaging informatics, clinical informatics, and public health informatics) may be essential in helping improve the ability to bring basic research findings to the bedside, evaluate the efficacy of interventions across communities, and enable the assessment of the eventual impact of translational medicine innovations on health policies. Here, a brief description is provided for a selection of key biomedical informatics topics (Decision Support, Natural Language Processing, Standards, Information Retrieval, and Electronic Health Records) and their relevance to translational medicine. Based on contributions and advancements in each of these topic areas, the article proposes that biomedical informatics practitioners ("biomedical informaticians") can be essential members of translational medicine teams

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Promoting ranking diversity for genomics search with relevance-novelty combined model

Author: Hu Xiaohua
Huang Jimmy Xiangji
Li Zhoujun
Yin Xiaoshi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

PubMed Central

Exploiting semantics for improving clinical information retrieval

Author: Babashzadeh Atanaz
Publication venue
Publication date: 23/06/2016
Field of study

Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and going beyond keywords matching. To address these issues, in this study we attempt to use semantic information to improve the performance of clinical IR systems by representing queries in an expressive and meaningful context. In this study we propose query context modeling to improve the effectiveness of clinical IR systems. To model query contexts we propose two novel approaches to modeling medical query contexts. The first approach concerns modeling medical query contexts based on mining semantic-based AR for improving clinical text retrieval. The query context is derived from the rules that cover the query and then weighted according to their semantic relatedness to the query concepts. In our second approach we model a representative query context by developing query domain ontology. To develop query domain ontology we extract all the concepts that have semantic relationship with the query concept(s) in UMLS ontologies. Query context represents concepts extracted from query domain ontology and weighted according to their semantic relatedness to the query concept(s). The query context is then exploited in the patient records query expansion and re-ranking for improving clinical retrieval performance. We evaluate this approach on the TREC Medical Records dataset. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based IR model

YorkSpace

Data preparation for biomedical knowledge domain visualization: a probabilistic record linkage and information fusion approach to citation data

Author: Synnestvedt Marie B.
Publication venue: Drexel University
Publication date
Field of study

This thesis presents a methodology of data preparation with probabilistic record linkage and information fusion for improving and enriching information visualizations of biomedical citation data. The problem of record linkage of citation databases where only non-unique identifiers such as author names and document titles are available as common identifiers to be linked was investigated. This problem in citation data parallels problems in clinical data and Knowledge Discovery in Databases (KDD) methods from clinical data mining are evaluated. Probabilistic and deterministic (exact-match) record linkage models were developed and compared through the use of a gold standard or truth dataset. Empirical comparison with ROC analysis of record linkage models showed a significant difference (p=.000) in performance of a probabilistic model over deterministic models. The methodology was evaluated with probabilistic linkage of records from the Web of Science, Medline, and CINAHL citation databases in the knowledge domains of medical informatics, HIV/AIDS, and nursing informatics. Data quality metrics for datasets prepared with probabilistic record linkage and information fusion showed improvement in completeness of key variables and reduction in sample bias. The resulting visualizations offered a richer information space for users through an increase in terms entering the visualization. The significant contributions of this work include the development of a novel model of probabilistic record linkage for biomedical citation databases which improves upon existing deterministic models. In addition a methodology for improving and enriching knowledge domain visualizations though a data preparation approach has been validated with analyses of multiple citation databases and knowledge domains. The data preparation methodology of probabilistic record linkage with information fusion offers a remedy for data quality problems, and the opportunity to enrich visualizations with added content for user exploration, which in turn improves the utility of knowledge domain visualizations as a medium for assessing available evidence and forming hypotheses.Ph.D., Information Science -- Drexel University, 200

Drexel Libraries E-Repository and Archives

Patent citation analysis with Google

Author: Alcácer
Alcácer
Amara
Bar-Ilan
Brusoni
Callaert
Callaert
Callaert
Cohen
Collins
Courtial
Cronin
De Groote
Guan
Harzing
He
Huang
Hung
Jacsó
Khabsa
Kousha
Kousha
Kousha
Kousha
Kousha
Kousha
Kousha
Kulkarni
Lawson
Lee
Lemley
Leydesdorff
Li
Liaw
Lo
Lopez
López-Cózar
Meho
Meyer
Meyer
Meyer
Meyer
Michel
Nagaoka
Nanba
Narin
Narin
Narin
Oppenheim
Roach
Schmoch
Shirabe
Subramanian
Thelwall
Thelwall
Thelwall
Tijssen
Tijssen
Tseng
Van Looy
Vaughan
Vaughan
Verbeek
Vianen
Winter
Yoon
Publication venue: 'Wiley'
Publication date: 13/07/2015
Field of study

This is an accepted manuscript of an article published by Wiley-Blackwell in Journal of the Association for Information Science and Technology on 23/09/2015, available online: https://doi.org/10.1002/asi.23608 The accepted version of the publication may differ from the final published version.Citations from patents to scientific publications provide useful evidence about the commercial impact of academic research, but automatically searchable databases are needed to exploit this connection for large-scale patent citation evaluations. Google covers multiple different international patent office databases but does not index patent citations or allow automatic searches. In response, this article introduces a semiautomatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98%. The method was evaluated with 322,192 science and engineering Scopus articles from every second year for the period 1996–2012. Although manual Google Patent searches give more results, especially for articles with many patent citations, the difference is not large enough to be a major problem. Within Biomedical Engineering, Biotechnology, and Pharmacology & Pharmaceutics, 7% to 10% of Scopus articles had at least one patent citation but other fields had far fewer, so patent citation analysis is only relevant for a minority of publications. Low but positive correlations between Google Patent citations and Scopus citations across all fields suggest that traditional citation counts cannot substitute for patent citations when evaluating research

CiteSeerX

Crossref

Wolverhampton Intellectual Repository and E-theses

Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

Author: Kawalia Shweta Bagewadi
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

bonndoc – Der Publikationsserver der Universität Bonn