200 research outputs found
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
Query Expansion for Survey Question Retrieval in the Social Sciences
In recent years, the importance of research data and the need to archive and
to share it in the scientific community have increased enormously. This
introduces a whole new set of challenges for digital libraries. In the social
sciences typical research data sets consist of surveys and questionnaires. In
this paper we focus on the use case of social science survey question reuse and
on mechanisms to support users in the query formulation for data sets. We
describe and evaluate thesaurus- and co-occurrence-based approaches for query
expansion to improve retrieval quality in digital libraries and research data
archives. The challenge here is to translate the information need and the
underlying sociological phenomena into proper queries. As we can show retrieval
quality can be improved by adding related terms to the queries. In a direct
comparison automatically expanded queries using extracted co-occurring terms
can provide better results than queries manually reformulated by a domain
expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory
and Practice of Digital Libraries 2015 (TPDL 2015
Improving ranking for systematic reviews using query adaptation
Identifying relevant studies for inclusion in systematic reviews requires significant effort from human experts who manually screen large numbers of studies. The problem is made more difficult by the growing volume of medical literature and Information Retrieval techniques have proved to be useful to reduce workload. Reviewers are often interested in particular types of evidence such as Diagnostic Test Accuracy studies. This paper explores the use of query adaption to identify particular types of evidence and thereby reduce the workload placed on reviewers. A simple retrieval system that ranks studies using TF.IDF weighted cosine similarity was implemented. The Log-Likelihood, ChiSquared and Odds-Ratio lexical statistics and relevance feedback were used to generate sets of terms that indicate evidence relevant to Diagnostic Test Accuracy reviews. Experiments using a set of 80 systematic reviews from the CLEF2017 and CLEF2018 eHealth tasks demonstrate that the approach improves retrieval performance
Probabilistic models of information retrieval based on measuring the divergence from randomness
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model
Querying a Bioinformatic Data Sources Registry with Concept Lattices
ISSN 0302-9743 (Print) 1611-3349 (Online) ISBN 978-3-540-27783-5International audienceBioinformatic data sources available on the web are multiple and heterogenous. The lack of documentation and the difficulty of interaction with these data banks require users competence in both informatics and biological fields for an optimal use of sources contents that remain rather under exploited. In this paper we present an approach based on formal concept analysis to classify and search relevant bioinformatic data sources for a given user query. It consists in building the concept lattice from the binary relation between bioinformatic data sources and their associated metadata. The concept built from a given user query is then merged into the concept lattice. The result is given by the extraction of the set of sources belonging to the extents of the query concept subsumers in the resulting concept lattice. The sources ranking is given by the concept specificity order in the concept lattice. An improvement of the approach consists in automatic refinement of the query thanks to domain ontologies. Two forms of refinement are possible by generalisation and by specialisation
Intrasession and Between-Visit Variability of Sector Peripapillary Angioflow Vessel Density Values Measured with the Angiovue Optical Coherence Tomograph in Different Retinal Layers in Ocular Hypertension and Glaucoma
PURPOSE: To evaluate intrasession and between-visit reproducibility of sector peripapillary angioflow vessel-density (PAFD, %) values in the optic nerve head (ONH) and radial peripapillary capillaries (RPC) layers, respectively, and to analyze the influence of the corresponding sector retinal nerve fiber layer thickness (RNFLT) on the results. METHODS: High quality images acquired with the Angiovue/RTVue-XR Avanti optical coherence tomograph (Optovue Inc., Fremont, USA) on 1 eye of 18 stable glaucoma and ocular hypertension patients were analyzed using the Optovue 2015.100.0.33 software version. Three images were acquired in one visit and 1 image 3 months later. RESULTS: PAFD image quality for all images necessary to calculate reproducibility was sufficient to analysis only in 18 of the 83 participants (21.7%) who were successfully imaged for RNFLT. Intrasession coefficient of variation (CV) ranged between 2.30 and 3.89%, and 3.51 and 5.12% for the peripapillary sectors in the ONH and RPC layers, respectively. The corresponding between-visit CV values ranged between 3.05 and 4.26%, and 4.99 and 6.90%, respectively. Intrasession SD did not correlate with the corresponding RNFLT in any sector in either layer (P>/=0.170). In the ONH layer sector PAFD values did not correlate with the corresponding RNFLT values (P>/=0.100). In contrast, in the RPC layer a significant positive correlation between the corresponding sector PAFD and RNFLT values was found for all but one peripapillary sectors (Pearson-r range: 0.652 to 0.771, P</=0.0046). CONCLUSION: Though in several patients routine use of PAFD measurement may be limited by suboptimal image quality, in the successfully imaged cases (21.7% of the study eyes in the current investigation) reproducibility of sector PAFD values seems to be sufficient for clinical research. In stable patients intrasession variability explains most of the between-visit variability. Sector PAFD variability is independent from sector RNFLT, a marker of glaucoma severity. In the RPC layer sector PAFD and RNFLT show strong to very strong positive correlation
Recommended from our members
FloatingCanvas: quantification of 3D retinal structures from spectral-domain optical coherence tomography
Spectral-domain optical coherence tomography (SD-OCT) provides volumetric images of retinal structures with unprecedented detail. Accurate segmentation algorithms and feature quantification in these images, however, are needed to realize the full potential of SD-OCT. The fully automated segmentation algorithm, FloatingCanvas, serves this purpose and performs a volumetric segmentation of retinal tissue layers in three-dimensional image volume acquired around the optic nerve head without requiring any pre-processing. The reconstructed layers are analysed to extract features such as blood vessels and retinal nerve fibre layer thickness. Findings from images obtained with the RTVue-100 SD-OCT (Optovue, Fremont, CA, USA) indicate that FloatingCanvas is computationally efficient and is robust to the noise and low contrast in the images. The FloatingCanvas segmentation demonstrated good agreement with the human manual grading. The retinal nerve fibre layer thickness maps obtained with this method are clinically realistic and highly reproducible compared with time-domain StratusOCT™
- …