6,753 research outputs found

    Unbiased Comparative Evaluation of Ranking Functions

    Full text link
    Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing kk systems against a baseline, and ranking kk systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

    Improving tag recommendation using social networks

    Get PDF
    In this paper we address the task of recommending additional tags to partially annotated media objects, in our case images. We propose an extendable framework that can recommend tags using a combination of different personalised and collective contexts. We combine information from four contexts: (1) all the photos in the system, (2) a user's own photos, (3) the photos of a user's social contacts, and (4) the photos posted in the groups of which a user is a member. Variants of methods (1) and (2) have been proposed in previous work, but the use of (3) and (4) is novel. For each of the contexts we use the same probabilistic model and Borda Count based aggregation approach to generate recommendations from different contexts into a unified ranking of recommended tags. We evaluate our system using a large set of real-world data from Flickr. We show that by using personalised contexts we can significantly improve tag recommendation compared to using collective knowledge alone. We also analyse our experimental results to explore the capabilities of our system with respect to a user's social behaviour
    corecore