Search CORE

200 research outputs found

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

Author: B Mitra
C Carpineto
C Peters
G Amati
KD Onal
M Braschler
M Johnson
P Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2019
Field of study

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short

arXiv.org e-Print Archive

Crossref

Query Expansion for Survey Question Retrieval in the Social Sciences

Author: B Zapilko
C Carpineto
D Hienert
DC Blair
E Brent
GW Furnas
J Xu
K Järvelin
P Schaer
S Dallmeier-Tiessen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/06/2015
Field of study

In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory and Practice of Digital Libraries 2015 (TPDL 2015

arXiv.org e-Print Archive

Crossref

1216 Effect of intraoperative mitomycin C in recurrent pterigium: Long term follow-up

Author: Carpineto P.
Cerulli A.M.
Falconio G.
Gallenga P.E.
Mastropasqua L.
Zuppardi E.
Publication venue: Published by Elsevier Ltd.
Publication date: 31/10/1995
Field of study

Elsevier - Publisher Connector

2147 Role of topical treatment with heparin eye-drops in the prevention of after-cataract

Author: Carpineto P.
Ciafre M.
Ciancaglini M.
Gallenga P.E.
Lobefalo L.
Mastropasqua L.
Publication venue: Published by Elsevier Ltd.
Publication date: 31/10/1995
Field of study

Elsevier - Publisher Connector

Improving ranking for systematic reviews using query adaptation

Author: A O’Mara-Eves
C Carpineto
I Shemilt
J McGowan
P Pojanapunya
P Rayson
R Baeza-Yates
S Karimi
T Dunning
WQ Gan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2019
Field of study

Identifying relevant studies for inclusion in systematic reviews requires significant effort from human experts who manually screen large numbers of studies. The problem is made more difficult by the growing volume of medical literature and Information Retrieval techniques have proved to be useful to reduce workload. Reviewers are often interested in particular types of evidence such as Diagnostic Test Accuracy studies. This paper explores the use of query adaption to identify particular types of evidence and thereby reduce the workload placed on reviewers. A simple retrieval system that ranks studies using TF.IDF weighted cosine similarity was implemented. The Log-Likelihood, ChiSquared and Odds-Ratio lexical statistics and relevance feedback were used to generate sets of terms that indicate evidence relevant to Diagnostic Test Accuracy reviews. Experiments using a set of 80 systematic reviews from the CLEF2017 and CLEF2018 eHealth tasks demonstrate that the approach improves retrieval performance

Crossref

White Rose Research Online

Probabilistic models of information retrieval based on measuring the divergence from randomness

Author: Allan J.
Amati G.
Bookstein A.
Carpineto C.
Cornelis Joost Van Rijsbergen
Croft W.
Damerau F.
Gianni Amati
Harman D.
Harter S. P.
Harter S. P.
Lafferty J.
Margulis E.
Ponte J.
Robertson S.
Robertson S.
Robertson S. E.
Robertson S. E.
Solomonoff R.
Solomonoff R.
van Rijsbergen C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2002
Field of study

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model

Crossref

Enlighten

In vivo analysis of conjunctiva in gold micro shunt implantation for glaucoma

Author: L. Agnifili
L. Mastropasqua
M. Ciancaglini
M. Figus
M. Nardi
M. Nubile
P. Carpineto
S. Lazzeri
V. Fasanella
Publication venue: 'BMJ'
Publication date
Field of study

Crossref

Querying a Bioinformatic Data Sources Registry with Concept Lattices

Author: B. Ganter
C. Carpineto
C. Carpineto
C. Carpineto
C. Discala
C. Wroe
C.A. Goble
D. Buttler
D. Carmel
D. Merwe van der
J. Kohler
P. Ganesan
P. Lord
R. Godin
R. Godin
R. Wille
S.B. Davidson
S.O. Kuznetsov
T. Oinn
U. Priss
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

ISSN 0302-9743 (Print) 1611-3349 (Online) ISBN 978-3-540-27783-5International audienceBioinformatic data sources available on the web are multiple and heterogenous. The lack of documentation and the difficulty of interaction with these data banks require users competence in both informatics and biological fields for an optimal use of sources contents that remain rather under exploited. In this paper we present an approach based on formal concept analysis to classify and search relevant bioinformatic data sources for a given user query. It consists in building the concept lattice from the binary relation between bioinformatic data sources and their associated metadata. The concept built from a given user query is then merged into the concept lattice. The result is given by the extraction of the set of sources belonging to the extents of the query concept subsumers in the resulting concept lattice. The sources ranking is given by the concept specificity order in the concept lattice. An improvement of the approach consists in automatic refinement of the query thanks to domain ontologies. Two forms of refinement are possible by generalisation and by specialisation

Crossref

INRIA a CCSD electronic archive server

Intrasession and Between-Visit Variability of Sector Peripapillary Angioflow Vessel Density Values Measured with the Angiovue Optical Coherence Tomograph in Different Retinal Layers in Ocular Hypertension and Glaucoma

Author: A Garas
AD Pechauer
DF Garway-Heath
F Coscas
G Holló
Gábor Holló
L Kuehlewein
L Liu
L Quaranta
P Carpineto
Rafael Linden
X Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/08/2016
Field of study

PURPOSE: To evaluate intrasession and between-visit reproducibility of sector peripapillary angioflow vessel-density (PAFD, %) values in the optic nerve head (ONH) and radial peripapillary capillaries (RPC) layers, respectively, and to analyze the influence of the corresponding sector retinal nerve fiber layer thickness (RNFLT) on the results. METHODS: High quality images acquired with the Angiovue/RTVue-XR Avanti optical coherence tomograph (Optovue Inc., Fremont, USA) on 1 eye of 18 stable glaucoma and ocular hypertension patients were analyzed using the Optovue 2015.100.0.33 software version. Three images were acquired in one visit and 1 image 3 months later. RESULTS: PAFD image quality for all images necessary to calculate reproducibility was sufficient to analysis only in 18 of the 83 participants (21.7%) who were successfully imaged for RNFLT. Intrasession coefficient of variation (CV) ranged between 2.30 and 3.89%, and 3.51 and 5.12% for the peripapillary sectors in the ONH and RPC layers, respectively. The corresponding between-visit CV values ranged between 3.05 and 4.26%, and 4.99 and 6.90%, respectively. Intrasession SD did not correlate with the corresponding RNFLT in any sector in either layer (P>/=0.170). In the ONH layer sector PAFD values did not correlate with the corresponding RNFLT values (P>/=0.100). In contrast, in the RPC layer a significant positive correlation between the corresponding sector PAFD and RNFLT values was found for all but one peripapillary sectors (Pearson-r range: 0.652 to 0.771, P</=0.0046). CONCLUSION: Though in several patients routine use of PAFD measurement may be limited by suboptimal image quality, in the successfully imaged cases (21.7% of the study eyes in the current investigation) reproducibility of sector PAFD values seems to be sufficient for clinical research. In stable patients intrasession variability explains most of the between-visit variability. Sector PAFD variability is independent from sector RNFLT, a marker of glaucoma severity. In the RPC layer sector PAFD and RNFLT show strong to very strong positive correlation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Semmelweis Repository

Recommended from our members

FloatingCanvas: quantification of 3D retinal structures from spectral-domain optical coherence tomography

Author: Ahlers
Bourne
Budenz
Cabrera Fernández
Carpineto
Chiu
David F. Garway-Heath
David P. Crabb
de Boer
Fabritius
Fabritius
Fercher
Garvin
Garvin
Golub
Guedes
Götzinger
Haeker
Haogang Zhu
Hee
Hood
Hood
Huang
Ishikawa
Jiao
Kim
Koozekanani
Lee
Leitgeb
Li
Mishra
Mujat
Nassif
Patricio G. Schlottmann
Quellec
Schuman
Swindale
Tuan Ho
van Velthoven
Wojtkowski
Publication venue: 'The Optical Society'
Publication date: 01/11/2010
Field of study

Spectral-domain optical coherence tomography (SD-OCT) provides volumetric images of retinal structures with unprecedented detail. Accurate segmentation algorithms and feature quantification in these images, however, are needed to realize the full potential of SD-OCT. The fully automated segmentation algorithm, FloatingCanvas, serves this purpose and performs a volumetric segmentation of retinal tissue layers in three-dimensional image volume acquired around the optic nerve head without requiring any pre-processing. The reconstructed layers are analysed to extract features such as blood vessels and retinal nerve fibre layer thickness. Findings from images obtained with the RTVue-100 SD-OCT (Optovue, Fremont, CA, USA) indicate that FloatingCanvas is computationally efficient and is robust to the noise and low contrast in the images. The FloatingCanvas segmentation demonstrated good agreement with the human manual grading. The retinal nerve fibre layer thickness maps obtained with this method are clinically realistic and highly reproducible compared with time-domain StratusOCT™

City Research Online

Crossref

UCL Discovery