2,262 research outputs found

    Counterfactual Estimation and Optimization of Click Metrics for Search Engines

    Full text link
    Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is run to serve users and compared with a baseline in an A/B test. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run (potentially infinitely) many A/B tests offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques

    News Session-Based Recommendations using Deep Neural Networks

    Full text link
    News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift. Some promising results have been recently achieved by the usage of Deep Learning techniques on Recommender Systems, specially for item's feature extraction and for session-based recommendations with Recurrent Neural Networks. In this paper, it is proposed an instantiation of the CHAMELEON -- a Deep Learning Meta-Architecture for News Recommender Systems. This architecture is composed of two modules, the first responsible to learn news articles representations, based on their text and metadata, and the second module aimed to provide session-based recommendations using Recurrent Neural Networks. The recommendation task addressed in this work is next-item prediction for users sessions: "what is the next most likely article a user might read in a session?" Users sessions context is leveraged by the architecture to provide additional information in such extreme cold-start scenario of news recommendation. Users' behavior and item features are both merged in an hybrid recommendation approach. A temporal offline evaluation method is also proposed as a complementary contribution, for a more realistic evaluation of such task, considering dynamic factors that affect global readership interests like popularity, recency, and seasonality. Experiments with an extensive number of session-based recommendation methods were performed and the proposed instantiation of CHAMELEON meta-architecture obtained a significant relative improvement in top-n accuracy and ranking metrics (10% on Hit Rate and 13% on MRR) over the best benchmark methods.Comment: Accepted for the Third Workshop on Deep Learning for Recommender Systems - DLRS 2018, October 02-07, 2018, Vancouver, Canada. https://recsys.acm.org/recsys18/dlrs

    Modeling Temporal Evidence from External Collections

    Full text link
    Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201

    How users assess web pages for information-seeking

    Get PDF
    In this paper, we investigate the criteria used by online searchers when assessing the relevance of web pages for information-seeking tasks. Twenty four participants were given three tasks each, and indicated the features of web pages which they employed when deciding about the usefulness of the pages in relation to the tasks. These tasks were presented within the context of a simulated work-task situation. We investigated the relative utility of features identified by participants (web page content,structure and quality), and how the importance of these features is affected by the type of information-seeking task performed and the stage of the search. The results of this study provide a set of criteria used by searchers to decide about the utility of web pages for different types of tasks. Such criteria can have implications for the design of systems that use or recommend web pages

    Temporal Information Models for Real-Time Microblog Search

    Get PDF
    Real-time search in Twitter and other social media services is often biased towards the most recent results due to the “in the moment” nature of topic trends and their ephemeral relevance to users and media in general. However, “in the moment”, it is often difficult to look at all emerging topics and single-out the important ones from the rest of the social media chatter. This thesis proposes to leverage on external sources to estimate the duration and burstiness of live Twitter topics. It extends preliminary research where itwas shown that temporal re-ranking using external sources could indeed improve the accuracy of results. To further explore this topic we pursued three significant novel approaches: (1) multi-source information analysis that explores behavioral dynamics of users, such as Wikipedia live edits and page view streams, to detect topic trends and estimate the topic interest over time; (2) efficient methods for federated query expansion towards the improvement of query meaning; and (3) exploiting multiple sources towards the detection of temporal query intent. It differs from past approaches in the sense that it will work over real-time queries, leveraging on live user-generated content. This approach contrasts with previous methods that require an offline preprocessing step
    • …
    corecore