221,971 research outputs found

    Investigating Retrieval Method Selection with Axiomatic Features

    Get PDF
    We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior

    iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

    Full text link
    Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 201

    Relation Discovery from Web Data for Competency Management

    Get PDF
    This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

    Context Models For Web Search Personalization

    Full text link
    We present our solution to the Yandex Personalized Web Search Challenge. The aim of this challenge was to use the historical search logs to personalize top-N document rankings for a set of test users. We used over 100 features extracted from user- and query-depended contexts to train neural net and tree-based learning-to-rank and regression models. Our final submission, which was a blend of several different models, achieved an NDCG@10 of 0.80476 and placed 4'th amongst the 194 teams winning 3'rd prize

    Contextualised Browsing in a Digital Library's Living Lab

    Full text link
    Contextualisation has proven to be effective in tailoring \linebreak search results towards the users' information need. While this is true for a basic query search, the usage of contextual session information during exploratory search especially on the level of browsing has so far been underexposed in research. In this paper, we present two approaches that contextualise browsing on the level of structured metadata in a Digital Library (DL), (1) one variant bases on document similarity and (2) one variant utilises implicit session information, such as queries and different document metadata encountered during the session of a users. We evaluate our approaches in a living lab environment using a DL in the social sciences and compare our contextualisation approaches against a non-contextualised approach. For a period of more than three months we analysed 47,444 unique retrieval sessions that contain search activities on the level of browsing. Our results show that a contextualisation of browsing significantly outperforms our baseline in terms of the position of the first clicked item in the result set. The mean rank of the first clicked document (measured as mean first relevant - MFR) was 4.52 using a non-contextualised ranking compared to 3.04 when re-ranking the result lists based on similarity to the previously viewed document. Furthermore, we observed that both contextual approaches show a noticeably higher click-through rate. A contextualisation based on document similarity leads to almost twice as many document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201
    • …
    corecore