5 research outputs found

    Distributed media indexing based on MPI and MapReduce

    Get PDF
    Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI

    Serendipitous Exploration of Large-scale Product Catalogs

    Get PDF
    Abstract-Online shopping has developed to a stage where catalogs have become very large and diverse. Thus, it is a challenge to present relevant items to potential customers within a very few interactions. This is even more so when users have no defined shopping objectives but operate in an opportunistic mindset. This problem is often tackled by recommender systems. However, these systems rely on consistent user interaction patterns to predict items of interest. In contrast, we propose to adapt the classical information retrieval (IR) paradigm for the purpose of accessing catalog items in a context of un-predictable user interaction. Accordingly, we present a novel information access strategy based on the notion of interest rather than relevance. We detail the design of a scalable browsing system including learning capabilities joint with a limited-memory model. Our approach enables locating interesting items within a few steps while not requiring good quality descriptions. Our system allows customer to seamlessly change browsing objectives without having to start explicitly a new session. An evaluation of our approach based on both artificial and real-life datasets demonstrates its efficiency in learning and adaptation. I. MOTIVATION The emergence of online shopping has offered new opportunities to propose services and products to customers. Currently, many online shops are not anymore restricted to a certain category of products. For example Amazon, initially focused on cultural and entertainment media (books, music, and video), is now offering products as diverse as home appliances or jewelry. Even more crucial, we usually find thousands of items within a product category, e.g. 38 million books and 3,5 million jewelry items on Amazon. Both the breadth of product lines and the depth within a product line not only boost the volume of the catalogs but also make it difficult for the customer to find products of interest without an accurate search protocol. Presenting relevant products to potential customers is the goal of recommender systems. Independent of their type (collaborative filtering systems, content-based recommender, etc), recommender systems usually operate on a user profile gained from previous shopping sessions. For this reason, recommender systems suffer from the cold-start problem, when new users and/or new products appear In contrast to the above, our approach does not require the definition of a user profile nor it imposes specific search sessions with pre-defined objectives. In other words, we present an efficient product access strategy enabling intuitive browsing by estimating the user's intention from his/her input to the system and displaying items that are considered as most interesting to him/her (and thus likely to be purchased). Our new information access strategy is based on the notion of current interest rather than on the notion of relevance classically used in Information Retrieval (O1) We accommodate serendipity. We assume no pre-defined (fixed) objective of the user's chain of actions; (O2) The system matches classic (simple) interaction models; (O3) The system is scalable in terms of the volume of the product catalog. Our approach results in an interactive navigation system, which let the user operate naturally over the product catalog while swiftly reacting to changes in the browsing objectives. The major difference with earlier approaches is a rapidly adapting system, that copes with radical changes, and is scalable to operate over realistic-scale product catalogs. The remainder of the paper is structured as follows: in section II, we discuss relevant approaches for information characterisation and content access strategies in large repositories. In section III, we present our interaction model, which describes the type of interaction that is expected from the user and what information is carried over with this interaction. We formalise our navigation model, anticipating functional issues in section IV. In particular, we review its properties ensuring scalability and compatibility with other models. In section V, we propose a comprehensive assessment of the performance of our model in an adaptive browsing scenario. At every browsing step, the system aims at displaying the most useful items to the user with respect to past interaction. Although our study includes an inherent temporal dimension, which makes the evaluation context different from that of classical searc

    Distributed media indexing based on MPI and MapReduce

    No full text
    corecore