58 research outputs found
Query Expansion for Survey Question Retrieval in the Social Sciences
In recent years, the importance of research data and the need to archive and
to share it in the scientific community have increased enormously. This
introduces a whole new set of challenges for digital libraries. In the social
sciences typical research data sets consist of surveys and questionnaires. In
this paper we focus on the use case of social science survey question reuse and
on mechanisms to support users in the query formulation for data sets. We
describe and evaluate thesaurus- and co-occurrence-based approaches for query
expansion to improve retrieval quality in digital libraries and research data
archives. The challenge here is to translate the information need and the
underlying sociological phenomena into proper queries. As we can show retrieval
quality can be improved by adding related terms to the queries. In a direct
comparison automatically expanded queries using extracted co-occurring terms
can provide better results than queries manually reformulated by a domain
expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory
and Practice of Digital Libraries 2015 (TPDL 2015
Finding related sentence pairs in MEDLINE
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure
CAMbase – A XML-based bibliographical database on Complementary and Alternative Medicine (CAM)
The term "Complementary and Alternative Medicine (CAM)" covers a variety of approaches to medical theory and practice, which are not commonly accepted by representatives of conventional medicine. In the past two decades, these approaches have been studied in various areas of medicine. Although there appears to be a growing number of scientific publications on CAM, the complete spectrum of complementary therapies still requires more information about published evidence. A majority of these research publications are still not listed in electronic bibliographical databases such as MEDLINE. However, with a growing demand by patients for such therapies, physicians increasingly need an overview of scientific publications on CAM. Bearing this in mind, CAMbase, a bibliographical database on CAM was launched in order to close this gap. It can be accessed online free of charge or additional costs. The user can peruse more than 80,000 records from over 30 journals and periodicals on CAM, which are stored in CAMbase. A special search engine performing syntactical and semantical analysis of textual phrases allows the user quickly to find relevant bibliographical information on CAM. Between August 2003 and July 2006, 43,299 search queries, an average of 38 search queries per day, were registered focussing on CAM topics such as acupuncture, cancer or general safety aspects. Analysis of the requests led to the conclusion that CAMbase is not only used by scientists and researchers but also by physicians and patients who want to find out more about CAM. Closely related to this effort is our aim to establish a modern library center on Complementary Medicine which offers the complete spectrum of a modern digital library including a document delivery-service for physicians, therapists, scientists and researchers
Recommended from our members
What Google Maps can do for biomedical data dissemination: examples and a design study
BACKGROUND: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data.
RESULTS: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers.
CONCLUSIONS: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations
The unifrac significance test is sensitive to tree topology
Long et al. (BMC Bioinformatics 2014, 15(1):278) describe a “discrepancy” in using UniFrac to assess statistical significance of community differences. Specifically, they find that weighted UniFrac results differ between input trees where (a) replicate sequences each have their own tip, or (b) all replicates are assigned to one tip with an associated count. We argue that these are two distinct cases that differ in the probability distribution on which the statistical test is based, because of the differences in tree topology. Further study is needed to understand which randomization procedure best detects different aspects of community dissimilarities
Experiments in term expansion using thesauri in Spanish
This paper presents some experiments carried out this year in the Spanish monolingual task at CLEF2002. The objective is to continue our research on term expansion. Last year we presented results regarding stemming. Now, our effort is centred on term expansion using thesauri. Many words that derive from the same stem have a close semantic content. However other words with very different stems also have semantically close senses. In this case, the analysis of the relationships between words in a document collection can be used to construct a thesaurus of related terms. The thesaurus can then be used to expand a term with the best related terms. This paper describes some experiments carried out to study term expansion using association and similarity thesauri
- …