Search CORE

69 research outputs found

Recommended from our members

A Study of Retrieval Models for Long Documents and Queries in Information Retrieval

Author: Cummins Ronan
Publication venue: Proceedings of the 25th International Conference on World Wide Web
Publication date: 11/04/2016
Field of study

Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries

Apollo (Cambridge)

Acrylamide formation in potato products

Author: Brunton Nigel
Butler Francis
Cummins Enda
Danaher Martin
Gormley Ronan T.
O'Keeffe Michael
Publication venue: Teagasc
Publication date: 01/08/2006
Field of study

End of Project ReportAcrylamide, a substance classified as a potential carcinogen, occurs in heated starchy foods at concentrations many times in excess of levels permitted in drinking water. Early surveys indicated that levels of acrylamide in potato products such as French fries and potato crisps were the highest of the foodstuffs investigated. The present project addressed this issue by determining levels of acrylamide precursors (asparagine and reducing sugars) in raw potatoes and levels of acrylamide in (i) potato products from different storage regimes, (ii) spot-sampled potatoes purchased from a local supermarket, (iii) samples that received pre-treatments and were fried at different temperatures and (iv) French fries reheated in different ovens.A risk assessment of the estimated acrylamide intake from potato products for various cohorts of the Irish population was also conducted

T-Stór

Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays

Author: Cummins Ronan
Rei Marek
Publication venue: https://aclweb.org/anthology/volumes/proceedings-of-the-11th-workshop-on-innovative-use-of-nlp-for-building-educational-applications/
Publication date: 09/06/2016
Field of study

We investigate the task of assessing sentence-level prompt relevance in learner essays. Various systems using word overlap, neural embeddings and neural compositional models are evaluated on two datasets of learner writing. We propose a new method for sentence-level similarity calculation, which learns to adjust the weights of pre-trained word embeddings for a specific task, achieving substantially higher accuracy compared to other relevant baselines

arXiv.org e-Print Archive

Apollo (Cambridge)

A Constraint to Automatically Regulate Document-Length Normalisation

Author: O &apos
Riordan
Ronan Cummins
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT Retrieval functions in information retrieval (IR) are fundamental to the effectiveness of search systems. However, considerable parameter tuning is often needed to increase the effectiveness of the retrieval. Document length normalisation is one such aspect that requires tuning on a per-query and per-collection basis for many retrieval functions. In this paper, we develop an approach that regularises the level of normalisation to apply on a per-query basis. We formally describe the interaction between query-terms and document length normalisation using a constraint. We then develop a general pre-retrieval approach to adapt a number of state-of-the-art ranking functions so that they adhere to the constraint. Finally, we empirically demonstrate that the adapted retrieval functions outperform default versions of the original retrieval functions, and perform at least comparably to tuned versions of the original functions, on a number of datasets. Essentially this regulates the normalisation parameter in a number of retrieval functions on a per-query basis in a principled manner

CiteSeerX

An Analysis of Learned Proximity Functions

Author: Mounia Lalmas
O &apos
Riordan
Ronan Cummins
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT A lot of recent work has shown that the proximity of terms can be exploited to improve the performance of information retrieval systems. We review a recent approach that uses an intuitive framework to incorporate proximity functions into vector based information retrieval systems. More importantly, we present several proximity functions that were learned within this framework and show that they adhere to previously developed constraints regarding the shape of a good proximity function. Finally, we include results of all of the learned functions on unseen test data that shows the consistency of the learning approach used

CiteSeerX

Cluster Based Term Weighting Model for Web Document Clustering

Author: Amit Singhal
Gerard Salton
Gerard Salton
JC David MacKay
MF Porter
PN Tan
Ronan Cummins
Sudipto Guha
Ying Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/03/2014
Field of study

The term weight is based on the frequency with which the term appears in that document. The term weighting scheme measures the importance of a term with respect to a document and a collection. A term with higher weight is more important than a term with lower weight. A document ranking model uses these term weights to find the rank of a document in a collection. We propose a cluster-based term weighting models based on the TF-IDF model. This term weighting model update the inter-cluster and intra-cluster frequency components uses the generated clusters as a reference in improving the retrieved relevant documents. These inter cluster and intra-cluster frequency components are used for weighting the importance of a term in addition to the term and document frequency components

Crossref

ePrints@Bangalore University