Search CORE

28 research outputs found

Stemming in the language modeling framework

Author: Giridhar Kumaran
James Allan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Simple Questions to Improve Pseudo-Relevance Feedback Results

Author: Giridhar Kumaran
James Allan
Publication venue
Publication date: 24/04/2020
Field of study

CiteSeerX

Reducing long queries using query quality predictors

Author: Giridhar Kumaran
Vitor R. Carvalho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30 % in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and select the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked subquery chosen by the ranking classifier results in a statistically significant average improvement of 8 % on our test sets. Analysis of the results shows that query reduction is wellsuited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries

CiteSeerX

Crossref

Recommended from our members

Interactive reformulation of long queries

Author: Kumaran Giridhar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2008
Field of study

We present new ways of interacting with a user based on query analysis and reformulation. Our goal is to not only improve retrieval performance but also help the user understand the retrieval process and collection she is searching. We do this by providing users information reflecting the potential impact their decisions will have on the retrieval process. This way, users can make more informed choices from the options presented to them by the retrieval system. Unlike most previous work in user interaction where a one-procedure-fits-all strategy was pursued, user interaction must be invoked only when there is potential for improvement. This is important as tedious user interaction can have an unfavorable impact on user experience. We present techniques for selective user interaction and show their utility in the context of two interaction techniques we have developed. Our results show that user interaction can be avoided in a vast number of cases without much deterioration in performance. User interaction can be made more productive by providing users with an optimally-sized set of high quality options. We present efficient techniques to determine such a set. When faced with a decision to interact with a user given a particular query, it is beneficial to determine the best interaction technique suited for that query. We solve this problem by obtaining implicit feedback from the user. By utilizing all the interaction-related techniques described in this thesis, we show through simulations and user studies that users can obtain better performance with less effort

ScholarWorks@UMass Amherst

Succinct Queries for Linking and Tracking News in Social Media

Author: Kumaran Giridhar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Text Classification and Named Entities for New Event Detection

Author: Giridhar Kumaran
James Allan
Publication venue
Publication date: 01/01/2004
Field of study

New Event Detection is a challenging task that still o#ers scope for great improvement after years of e#ort. In this paper we show how performance on New Event Detection (NED) can be improved by the use of text classification techniques as well as by using named entities in a new way. We explore modifications to the document representation in a vector space-based NED system. We also show that addressing named entities preferentially is useful only in certain situations. A combination of all the above results in a multi-stage NED system that performs much better than baseline single-stage NED systems

CiteSeerX

Crossref

Using Names and Topics for New Event Detection

Author: Giridhar Kumaran
James Allan
Publication venue
Publication date: 01/01/2005
Field of study

New Event Detection (NED) involves monitoring chronologically-ordered news streams to automatically detect the stories that report on new events. We compare two stories by finding three cosine similarities based on names, topics and the full text. These additional comparisons suggest treating the NED problem as a binary classification problem with the comparison scores serving as features. The classifier models we learned show statistically significant improvement over the baseline vector space model system on all the collections we tested, including the latest TDT5 collection. The presence of automatic speech recognizer (ASR) output of broadcast news in news streams can reduce performance and render our named entity recognition based approaches ineffective. We provide a solution to this problem achieving statistically significant improvements.

CiteSeerX

Crossref