Search CORE

58 research outputs found

Query Expansion for Survey Question Retrieval in the Social Sciences

Author: B Zapilko
C Carpineto
D Hienert
DC Blair
E Brent
GW Furnas
J Xu
K Järvelin
P Schaer
S Dallmeier-Tiessen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/06/2015
Field of study

In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory and Practice of Digital Libraries 2015 (TPDL 2015

arXiv.org e-Print Archive

Crossref

Finding related sentence pairs in MEDLINE

Author: C Friedman
CJ Rijsbergen van
DK Milton
EW Sayers
GW Furnas
H Zou
KL Currie
L Smith
L Smith
Larry H. Smith
P Langley
Q Ma
R Artstein
R Wadden
S Jellali
T Dietterich
V Vapnik
W Wilbur
W. John Wilbur
WG Kim
WJ Wilbur
WJ Wilbur
Z Lu
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure

Crossref

Springer - Publisher Connector

PubMed Central

CAMbase – A XML-based bibliographical database on Complementary and Alternative Medicine (CAM)

Author: A Rauber
A Smith
Arndt Buessing
B Druss
B Rosslenbroich
B Rosslenbroich
C Calvanese
Christa K Raak
GW Furnas
H Han
H Zillmann
HA Tindle
Hartmut Zillmann
J Barnes
J Ezzo
LS Murphy
Peter F Matthiessen
T Ostermann
T Ostermann
T Ostermann
T Ostermann
T Shahar
Thomas Ostermann
U Hartel
X Lin
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

The term "Complementary and Alternative Medicine (CAM)" covers a variety of approaches to medical theory and practice, which are not commonly accepted by representatives of conventional medicine. In the past two decades, these approaches have been studied in various areas of medicine. Although there appears to be a growing number of scientific publications on CAM, the complete spectrum of complementary therapies still requires more information about published evidence. A majority of these research publications are still not listed in electronic bibliographical databases such as MEDLINE. However, with a growing demand by patients for such therapies, physicians increasingly need an overview of scientific publications on CAM. Bearing this in mind, CAMbase, a bibliographical database on CAM was launched in order to close this gap. It can be accessed online free of charge or additional costs. The user can peruse more than 80,000 records from over 30 journals and periodicals on CAM, which are stored in CAMbase. A special search engine performing syntactical and semantical analysis of textual phrases allows the user quickly to find relevant bibliographical information on CAM. Between August 2003 and July 2006, 43,299 search queries, an average of 38 search queries per day, were registered focussing on CAM topics such as acupuncture, cancer or general safety aspects. Analysis of the requests led to the conclusion that CAMbase is not only used by scientists and researchers but also by physicians and patients who want to find out more about CAM. Closely related to this effort is our aim to establish a modern library center on Complementary Medicine which offers the complete spectrum of a modern digital library including a document delivery-service for physicians, therapists, scientists and researchers

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

What Google Maps can do for biomedical data dissemination: examples and a design study

Author: A Sinha
A Skupin
B Gretarsson
BB Bederson
BB Bederson
D Johnson
David H Laidlaw
E Demir
F Paulovich
F van Ham
G Aravindhan
GW Furnas
H Kuehn
J Heer
J Seo
K Arakawa
M Bostock
M Eisen
M Hegarty
M Meyer
N Elmqvist
N Henr
P Eades
P Shannon
R Jianu
R Jianu
R Jianu
R Jianu
Radu Jianu
S Berger
S Jul
S Saalfeld
Skupin A
T Munzner
T Yates
Z Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. RESULTS: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. CONCLUSIONS: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations

City Research Online

Crossref

Springer - Publisher Connector

PubMed Central

DigitalCommons@Florida International University

The unifrac significance test is sensitive to tree topology

Author: AP Martin
C Lozupone
CA Lozupone
Catherine A. Lozupone
E Abouheif
E Stackebrandt
GW Furnas
JB Losos
JG Caporaso
JR Long
KR Clarke
P Dixon
PD Schloss
PD Schloss
PJ Turnbaugh
Rob Knight
WP Maddison
Y Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2015
Field of study

Long et al. (BMC Bioinformatics 2014, 15(1):278) describe a “discrepancy” in using UniFrac to assess statistical significance of community differences. Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count. We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology. Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities

Crossref

PubMed Central

eScholarship - University of California

Analyzing and mining a code search engine usage log

Author: A Aula
A Kuhn
AJ Ko
Bajracharya S
BJ Jansen
C Silverstein
Cristina Videira Lopes
D Andrzejewski
D Poshyvanyk
DM Blei
E Linstead
E Linstead
F McCarey
F Zazo
G Maskeri
GC Murphy
GW Furnas
H Cui
H Liu
Holmes R
J Brandt
J Koenemann
J Xu
JI Maletic
M Umarji
M Whittle
Mandelin D
O Hummel
PF Baldi
R Hoffmann
S Bajracharya
S Henninger
S Kawaguchi
S Thummalapenta
Sillito J
ST Dumais
Sushil Krishna Bajracharya
T Joachims
T Joachims
TL Griffiths
Y Ye
Y Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Das 3D User Interface zSpace

Author: GW Furnas
KH H¨ohne
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Experiments in term expansion using thesauri in Spanish

Author: A Smeaton
D Wolfram
G Salton
G Salton
GW Furnas
H Billhardt
HJ Peat
J Minker
J Xu
Publication venue
Publication date: 01/01/2003
Field of study

This paper presents some experiments carried out this year in the Spanish monolingual task at CLEF2002. The objective is to continue our research on term expansion. Last year we presented results regarding stemming. Now, our effort is centred on term expansion using thesauri. Many words that derive from the same stem have a close semantic content. However other words with very different stems also have semantically close senses. In this case, the analysis of the relationships between words in a document collection can be used to construct a thesaurus of related terms. The thesaurus can then be used to expand a term with the best related terms. This paper describes some experiments carried out to study term expansion using association and similarity thesauri

E-LIS

Crossref

Gestion del Repositorio Documental de la Universidad de Salamanca

Boosting Connectivity in a Student Generated Collaborative Database

Author: BK Oldroyd
GW Furnas
JF Tinker
LM Gomez
LM Gomez
M Scardamalia
M Scardamalia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1991
Field of study

Crossref

Retrieving web search results using Max–Max soft clustering for Hindi query

Author: Christiane Fellbaum
D Marco
GA Miller
GW Furnas
KS Dwivedi
N Roberto
Ruihua Song
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref