69 research outputs found
Recommended from our members
A Study of Retrieval Models for Long Documents and Queries in Information Retrieval
Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries
Acrylamide formation in potato products
End of Project ReportAcrylamide, a substance classified as a potential carcinogen, occurs in heated
starchy foods at concentrations many times in excess of levels permitted in
drinking water. Early surveys indicated that levels of acrylamide in potato
products such as French fries and potato crisps were the highest of the
foodstuffs investigated. The present project addressed this issue by
determining levels of acrylamide precursors (asparagine and reducing sugars)
in raw potatoes and levels of acrylamide in (i) potato products from different
storage regimes, (ii) spot-sampled potatoes purchased from a local
supermarket, (iii) samples that received pre-treatments and were fried at
different temperatures and (iv) French fries reheated in different ovens.A risk
assessment of the estimated acrylamide intake from potato products for
various cohorts of the Irish population was also conducted
Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays
We investigate the task of assessing sentence-level
prompt relevance in learner essays. Various
systems using word overlap, neural embeddings
and neural compositional models are
evaluated on two datasets of learner writing.
We propose a new method for sentence-level
similarity calculation, which learns to
adjust the weights of pre-trained word embeddings
for a specific task, achieving substantially
higher accuracy compared to other relevant
baselines
A Constraint to Automatically Regulate Document-Length Normalisation
ABSTRACT Retrieval functions in information retrieval (IR) are fundamental to the effectiveness of search systems. However, considerable parameter tuning is often needed to increase the effectiveness of the retrieval. Document length normalisation is one such aspect that requires tuning on a per-query and per-collection basis for many retrieval functions. In this paper, we develop an approach that regularises the level of normalisation to apply on a per-query basis. We formally describe the interaction between query-terms and document length normalisation using a constraint. We then develop a general pre-retrieval approach to adapt a number of state-of-the-art ranking functions so that they adhere to the constraint. Finally, we empirically demonstrate that the adapted retrieval functions outperform default versions of the original retrieval functions, and perform at least comparably to tuned versions of the original functions, on a number of datasets. Essentially this regulates the normalisation parameter in a number of retrieval functions on a per-query basis in a principled manner
An Analysis of Learned Proximity Functions
ABSTRACT A lot of recent work has shown that the proximity of terms can be exploited to improve the performance of information retrieval systems. We review a recent approach that uses an intuitive framework to incorporate proximity functions into vector based information retrieval systems. More importantly, we present several proximity functions that were learned within this framework and show that they adhere to previously developed constraints regarding the shape of a good proximity function. Finally, we include results of all of the learned functions on unseen test data that shows the consistency of the learning approach used
Cluster Based Term Weighting Model for Web Document Clustering
The term weight is based on the frequency with which the term appears in that document. The term weighting scheme measures the importance of a term with respect to a document and a collection. A term with higher weight is more important than a term with lower weight. A document ranking model uses these term weights to find the rank of a document in a collection. We propose a cluster-based term weighting models based on the TF-IDF model. This term weighting model update the inter-cluster and intra-cluster frequency components uses the generated clusters as a reference in improving the retrieved relevant documents. These inter cluster and intra-cluster frequency components are used for weighting the importance of a term in addition to the term and document frequency components
- …