5 research outputs found

    Topic modelling of clickthrough data in image search

    Get PDF
    In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into accoun

    Real-time Selection of Video Streams for Live TV Broadcasting Based on Query-by-Example Using a 3D Model

    Get PDF
    The emergence of low-cost cameras with nearly professional features in the consumer market represents a new important source of video information. For example, using an increasing number of these cameras in live TV broadcastings enables obtaining varied contents without affecting the production costs. However, searching for interesting shots (e.g., a certain view of a specific car in a race) among many video sources in real-time can be difficult for a Technical Director (TD). So, TDs require a mechanism to easily and precisely represent the kind of shot they want to obtain abstracting them from the need to be aware of all the views provided by the cameras. In this paper we present our proposal to help a TD to visually define, using an interface for the definition of 3D scenes, an interesting sample view of one or more objects in the scenario. We recreate the views of the cameras in a 3D engine and apply 3D geometric computations on their virtual view, instead of analyzing the real images they provide, to enable an efficient and precise real-time selection. Specifically, our system computes a similarity measure to rank the candidate cameras. Moreover, we present a prototype of the system and an experimental evaluation that shows the interest of our proposal

    Serendipitous Exploration of Large-scale Product Catalogs

    Get PDF
    Abstract-Online shopping has developed to a stage where catalogs have become very large and diverse. Thus, it is a challenge to present relevant items to potential customers within a very few interactions. This is even more so when users have no defined shopping objectives but operate in an opportunistic mindset. This problem is often tackled by recommender systems. However, these systems rely on consistent user interaction patterns to predict items of interest. In contrast, we propose to adapt the classical information retrieval (IR) paradigm for the purpose of accessing catalog items in a context of un-predictable user interaction. Accordingly, we present a novel information access strategy based on the notion of interest rather than relevance. We detail the design of a scalable browsing system including learning capabilities joint with a limited-memory model. Our approach enables locating interesting items within a few steps while not requiring good quality descriptions. Our system allows customer to seamlessly change browsing objectives without having to start explicitly a new session. An evaluation of our approach based on both artificial and real-life datasets demonstrates its efficiency in learning and adaptation. I. MOTIVATION The emergence of online shopping has offered new opportunities to propose services and products to customers. Currently, many online shops are not anymore restricted to a certain category of products. For example Amazon, initially focused on cultural and entertainment media (books, music, and video), is now offering products as diverse as home appliances or jewelry. Even more crucial, we usually find thousands of items within a product category, e.g. 38 million books and 3,5 million jewelry items on Amazon. Both the breadth of product lines and the depth within a product line not only boost the volume of the catalogs but also make it difficult for the customer to find products of interest without an accurate search protocol. Presenting relevant products to potential customers is the goal of recommender systems. Independent of their type (collaborative filtering systems, content-based recommender, etc), recommender systems usually operate on a user profile gained from previous shopping sessions. For this reason, recommender systems suffer from the cold-start problem, when new users and/or new products appear In contrast to the above, our approach does not require the definition of a user profile nor it imposes specific search sessions with pre-defined objectives. In other words, we present an efficient product access strategy enabling intuitive browsing by estimating the user's intention from his/her input to the system and displaying items that are considered as most interesting to him/her (and thus likely to be purchased). Our new information access strategy is based on the notion of current interest rather than on the notion of relevance classically used in Information Retrieval (O1) We accommodate serendipity. We assume no pre-defined (fixed) objective of the user's chain of actions; (O2) The system matches classic (simple) interaction models; (O3) The system is scalable in terms of the volume of the product catalog. Our approach results in an interactive navigation system, which let the user operate naturally over the product catalog while swiftly reacting to changes in the browsing objectives. The major difference with earlier approaches is a rapidly adapting system, that copes with radical changes, and is scalable to operate over realistic-scale product catalogs. The remainder of the paper is structured as follows: in section II, we discuss relevant approaches for information characterisation and content access strategies in large repositories. In section III, we present our interaction model, which describes the type of interaction that is expected from the user and what information is carried over with this interaction. We formalise our navigation model, anticipating functional issues in section IV. In particular, we review its properties ensuring scalability and compatibility with other models. In section V, we propose a comprehensive assessment of the performance of our model in an adaptive browsing scenario. At every browsing step, the system aims at displaying the most useful items to the user with respect to past interaction. Although our study includes an inherent temporal dimension, which makes the evaluation context different from that of classical searc

    Topic modelling of clickthrough data in image search

    No full text
    In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account