    Improving package recommendations through query relaxation

    Recommendation systems aim to identify items that are likely to be of interest to users. In many cases, users are interested in package recommendations as collections of items. For example, a dietitian may wish to derive a dietary plan as a collection of recipes that is nutritionally balanced, and a travel agent may want to produce a vacation package as a coordinated collection of travel and hotel reservations. Recent work has explored extending recommendation systems to support packages of items. These systems need to solve complex combinatorial problems, enforcing various properties and constraints defined on sets of items. Introducing constraints on packages makes recommendation queries harder to evaluate, but also harder to express: Queries that are under-specified produce too many answers, whereas queries that are over-specified frequently miss interesting solutions. In this paper, we study query relaxation techniques that target package recommendation systems. Our work offers three key insights: First, even when the original query result is not empty, relaxing constraints can produce preferable solutions. Second, a solution due to relaxation can only be preferred if it improves some property specified by the query. Third, relaxation should not treat all constraints as equals: some constraints are more important to the users than others. Our contributions are threefold: (a) we define the problem of deriving package recommendations through query relaxation, (b) we design and experimentally evaluate heuristics that relax query constraints to derive interesting packages, and (c) we present a crowd study that evaluates the sensitivity of real users to different kinds of constraints and demonstrates that query relaxation is a powerful tool in diversifying package recommendations

    Building Representative Composite Items

    International audienceThe problem of summarizing a large collection of homogeneous items has been addressed extensively in particular in the case of geo-tagged datasets (e.g. Flickr photos and tags). In our work, we study the problem of summarizing large collections of heterogeneous items. For example, a user planning to spend extended periods of time in a given city would be interested in seeing a map of that city with item summaries in different geographic areas, each containing a theater, a gym, a bakery, a few restaurants and a subway station. We propose to solve that problem by building representative Composite Items (CIs). To the best of our knowledge, this is the first work that addresses the problem of finding representative CIs for heterogeneous items. Our problem naturally arises when summarizing geo-tagged datasets but also in other datasets such as movie or music summarization. We formalize building representative CIs as an optimization problem and propose KFC, an extended fuzzy clustering algorithm to solve it. We show that KFC converges and run extensive experiments on a variety of real datasets that validate its effectiveness

    Item Retrieval as Utility Estimation

    Retrieval systems have greatly improved over the last half century, estimating relevance to a latent user need in a wide variety of areas. One area that has not enjoyed such advancements is searching for items by attribute values, a common activity in e-commerce and science, particularly given numeric values. Existing item retrieval systems assume the user has a firm grasp of their own desires and can formulate a good Boolean or SQL-style query to retrieve items, as one would do with a database. A contrasting approach would be to estimate how well items match the user?s latent desires and return items ranked by this estimation. Towards this end, we present a retrieval model inspired by multi-criteria decision making theory, concentrating on numeric attributes. We evaluate our novel approach, the de-facto standard of Boolean retrieval, and several models proposed in the literature, in two user studies using Amazon Mechanical Turk. We use a competitive game to motivate test subjects and compare methods based on the results of the subjects? initial query and their success in the game. In our experiments, our new method signi cantly outperformed the others, whereas the Boolean approaches had the worst performance

    Mining Revenue-Maximizing Bundling Configuration

    With greater prevalence of social media, there is an increas-ing amount of user-generated data revealing consumer pref-erences for various products and services. Businesses seek to harness this wealth of data to improve their marketing strategies. Bundling, or selling two or more items for one price is a highly-practiced marketing strategy. In this pa-per, we address the bundle configuration problem from the data-driven perspective. Given a set of items in a seller’s in-ventory, we seek to determine which items should belong to which bundle so as to maximize the total revenue, by mining consumer preferences data. We show that this problem is NP-hard when bundles are allowed to contain more than two items. Therefore, we describe an optimal solution for bundle sizes up to two items, and propose two heuristic solutions for bundles of any larger size. We investigate the effective-ness and the efficiency of the proposed algorithms through experimentations on real-life rating-based preferences data

    Shortlisting Top-K Assignments

    In this paper we identify a novel query type, the top-K assignment query (αTop-K). Consider a set of objects P and a set of suppliers S, where each object pi ∈ P must be assigned to one supplier sj ∈ S. Assume that there is a cost cij associated with every object-supplier pair 〈pi, sjâŒȘ. The matching with the smallest total cost would assign each object pi to the supplier sj with the minimum cij value. In many scenarios, however, runner-up assignments may be required too, like for example when a decision maker needs to make additional considerations, not captured by cij values. In this case, it is necessary to examine several shortlisted assignments before choosing one. This motivates the αTop-K query, which computes the K best assignments, i.e., those achieving the K smallest total costs. Algorithms for the traditional assignment ranking problem could be adapted to process the query, but their time requirements are prohibitive for large datasets (cubic to the input size). In this work we exploit the specific properties of the αTop-K problem and develop scalable methods for its processing. We also consider its incremental version, where K is not specified in advance; instead, the best assignments are iteratively computed on demand. An empirical evaluation with real data verifies the practicality and efficiency of our framework. 1

    Approaches to implement and evaluate aggregated search

    La recherche d'information agrĂ©gĂ©e peut ĂȘtre vue comme un troisiĂšme paradigme de recherche d'information aprĂšs la recherche d'information ordonnĂ©e (ranked retrieval) et la recherche d'information boolĂ©enne (boolean retrieval). Les deux paradigmes les plus explorĂ©s jusqu'Ă  aujourd'hui retournent un ensemble ou une liste ordonnĂ©e de rĂ©sultats. C'est Ă  l'usager de parcourir ces ensembles/listes et d'en extraire l'information nĂ©cessaire qui peut se retrouver dans plusieurs documents. De maniĂšre alternative, la recherche d'information agrĂ©gĂ©e ne s'intĂ©resse pas seulement Ă  l'identification des granules (nuggets) d'information pertinents, mais aussi Ă  l'assemblage d'une rĂ©ponse agrĂ©gĂ©e contenant plusieurs Ă©lĂ©ments. Dans nos travaux, nous analysons les travaux liĂ©s Ă  la recherche d'information agrĂ©gĂ©e selon un schĂ©ma gĂ©nĂ©ral qui comprend 3 parties: dispatching de la requĂȘte, recherche de granules d'information et agrĂ©gation du rĂ©sultat. Les approches existantes sont groupĂ©es autours de plusieurs perspectives gĂ©nĂ©rales telle que la recherche relationnelle, la recherche fĂ©dĂ©rĂ©e, la gĂ©nĂ©ration automatique de texte, etc. Ensuite, nous nous sommes focalisĂ©s sur deux pistes de recherche selon nous les plus prometteuses: (i) la recherche agrĂ©gĂ©e relationnelle et (ii) la recherche agrĂ©gĂ©e inter-verticale. * La recherche agrĂ©gĂ©e relationnelle s'intĂ©resse aux relations entre les granules d'information pertinents qui servent Ă  assembler la rĂ©ponse agrĂ©gĂ©e. En particulier, nous nous sommes intĂ©ressĂ©s Ă  trois types de requĂȘtes notamment: requĂȘte attribut (ex. prĂ©sident de la France, PIB de l'Italie, maire de Glasgow, ...), requĂȘte instance (ex. France, Italie, Glasgow, Nokia e72, ...) et requĂȘte classe (pays, ville française, portable Nokia, ...). Pour ces requĂȘtes qu'on appelle requĂȘtes relationnelles nous avons proposĂ©s trois approches pour permettre la recherche de relations et l'assemblage des rĂ©sultats. Nous avons d'abord mis l'accent sur la recherche d'attributs qui peut aider Ă  rĂ©pondre aux trois types de requĂȘtes. Nous proposons une approche Ă  large Ă©chelle capable de rĂ©pondre Ă  des nombreuses requĂȘtes indĂ©pendamment de la classe d'appartenance. Cette approche permet l'extraction des attributs Ă  partir des tables HTML en tenant compte de la qualitĂ© des tables et de la pertinence des attributs. Les diffĂ©rentes Ă©valuations de performances effectuĂ©es prouvent son efficacitĂ© qui dĂ©passe les mĂ©thodes de l'Ă©tat de l'art. DeuxiĂšmement, nous avons traitĂ© l'agrĂ©gation des rĂ©sultats composĂ©s d'instances et d'attributs. Ce problĂšme est intĂ©ressant pour rĂ©pondre Ă  des requĂȘtes de type classe avec une table contenant des instances (lignes) et des attributs (colonnes). Pour garantir la qualitĂ© du rĂ©sultat, nous proposons des pondĂ©rations sur les instances et les attributs promouvant ainsi les plus reprĂ©sentatifs. Le troisiĂšme problĂšme traitĂ© concerne les instances de la mĂȘme classe (ex. France, Italie, Allemagne, ...). Nous proposons une approche capable d'identifier massivement ces instances en exploitant les listes HTML. Toutes les approches proposĂ©es fonctionnent Ă  l'Ă©chelle Web et sont importantes et complĂ©mentaires pour la recherche agrĂ©gĂ©e relationnelle. Enfin, nous proposons 4 prototypes d'application de recherche agrĂ©gĂ©e relationnelle. Ces derniers peuvent rĂ©pondre des types de requĂȘtes diffĂ©rents avec des rĂ©sultats relationnels. Plus prĂ©cisĂ©ment, ils recherchent et assemblent des attributs, des instances, mais aussi des passages et des images dans des rĂ©sultats agrĂ©gĂ©s. Un exemple est la requĂȘte ``Nokia e72" dont la rĂ©ponse sera composĂ©e d'attributs (ex. prix, poids, autonomie batterie, ...), de passages (ex. description, reviews, ...) et d'images. Les rĂ©sultats sont encourageants et illustrent l'utilitĂ© de la recherche agrĂ©gĂ©e relationnelle. * La recherche agrĂ©gĂ©e inter-verticale s'appuie sur plusieurs moteurs de recherche dits verticaux tel que la recherche d'image, recherche vidĂ©o, recherche Web traditionnelle, etc. Son but principal est d'assembler des rĂ©sultats provenant de toutes ces sources dans une mĂȘme interface pour rĂ©pondre aux besoins des utilisateurs. Les moteurs de recherche majeurs et la communautĂ© scientifique nous offrent dĂ©jĂ  une sĂ©rie d'approches. Notre contribution consiste en une Ă©tude sur l'Ă©valuation et les avantages de ce paradigme. Plus prĂ©cisĂ©ment, nous comparons 4 types d'Ă©tudes qui simulent des situations de recherche sur un total de 100 requĂȘtes et 9 sources diffĂ©rentes. Avec cette Ă©tude, nous avons identifiĂ©s clairement des avantages de la recherche agrĂ©gĂ©e inter-verticale et nous avons pu dĂ©duire de nombreux enjeux sur son Ă©valuation. En particulier, l'Ă©valuation traditionnelle utilisĂ©e en RI, certes la moins rapide, reste la plus rĂ©aliste. Pour conclure, nous avons proposĂ© des diffĂ©rents approches et Ă©tudes sur deux pistes prometteuses de recherche dans le cadre de la recherche d'information agrĂ©gĂ©e. D'une cĂŽtĂ©, nous avons traitĂ© trois problĂšmes importants de la recherche agrĂ©gĂ©e relationnelle qui ont portĂ© Ă  la construction de 4 prototypes d'application avec des rĂ©sultats encourageants. De l'autre cĂŽtĂ©, nous avons mis en place 4 Ă©tudes sur l'intĂ©rĂȘt et l'Ă©valuation de la recherche agrĂ©gĂ©e inter-verticale qui ont permis d'identifier les enjeux d'Ă©valuation et les avantages du paradigme. Comme suite Ă  long terme de ce travail, nous pouvons envisager une recherche d'information qui intĂšgre plus de granules relationnels et plus de multimĂ©dia.Aggregated search or aggregated retrieval can be seen as a third paradigm for information retrieval following the Boolean retrieval paradigm and the ranked retrieval paradigm. In the first two, we are returned respectively sets and ranked lists of search results. It is up to the time-poor user to scroll this set/list, scan within different documents and assemble his/her information need. Alternatively, aggregated search not only aims the identification of relevant information nuggets, but also the assembly of these nuggets into a coherent answer. In this work, we present at first an analysis of related work to aggregated search which is analyzed with a general framework composed of three steps: query dispatching, nugget retrieval and result aggregation. Existing work is listed aside different related domains such as relational search, federated search, question answering, natural language generation, etc. Within the possible research directions, we have then focused on two directions we believe promise the most namely: relational aggregated search and cross-vertical aggregated search. * Relational aggregated search targets relevant information, but also relations between relevant information nuggets which are to be used to assemble reasonably the final answer. In particular, there are three types of queries which would easily benefit from this paradigm: attribute queries (e.g. president of France, GDP of Italy, major of Glasgow, ...), instance queries (e.g. France, Italy, Glasgow, Nokia e72, ...) and class queries (countries, French cities, Nokia mobile phones, ...). We call these queries as relational queries and we tackle with three important problems concerning the information retrieval and aggregation for these types of queries. First, we propose an attribute retrieval approach after arguing that attribute retrieval is one of the crucial problems to be solved. Our approach relies on the HTML tables in the Web. It is capable to identify useful and relevant tables which are used to extract relevant attributes for whatever queries. The different experimental results show that our approach is effective, it can answer many queries with high coverage and it outperforms state of the art techniques. Second, we deal with result aggregation where we are given relevant instances and attributes for a given query. The problem is particularly interesting for class queries where the final answer will be a table with many instances and attributes. To guarantee the quality of the aggregated result, we propose the use of different weights on instances and attributes to promote the most representative and important ones. The third problem we deal with concerns instances of the same class (e.g. France, Germany, Italy ... are all instances of the same class). Here, we propose an approach that can massively extract instances of the same class from HTML lists in the Web. All proposed approaches are applicable at Web-scale and they can play an important role for relational aggregated search. Finally, we propose 4 different prototype applications for relational aggregated search. They can answer different types of queries with relevant and relational information. Precisely, we not only retrieve attributes and their values, but also passages and images which are assembled into a final focused answer. An example is the query ``Nokia e72" which will be answered with attributes (e.g. price, weight, battery life ...), passages (e.g. description, reviews ...) and images. Results are encouraging and they illustrate the utility of relational aggregated search. * The second research direction that we pursued concerns cross-vertical aggregated search, which consists of assembling results from different vertical search engines (e.g. image search, video search, traditional Web search, ...) into one single interface. Here, different approaches exist in both research and industry. Our contribution concerns mostly evaluation and the interest (advantages) of this paradigm. We propose 4 different studies which simulate different search situations. Each study is tested with 100 different queries and 9 vertical sources. Here, we could clearly identify new advantages of this paradigm and we could identify different issues with evaluation setups. In particular, we observe that traditional information retrieval evaluation is not the fastest but it remains the most realistic. To conclude, we propose different studies with respect to two promising research directions. On one hand, we deal with three important problems of relational aggregated search following with real prototype applications with encouraging results. On the other hand, we have investigated on the interest and evaluation of cross-vertical aggregated search. Here, we could clearly identify some of the advantages and evaluation issues. In a long term perspective, we foresee a possible combination of these two kinds of approaches to provide relational and cross-vertical information retrieval incorporating more focus, structure and multimedia in search results

    Personalised service discovery in mobile environments

    In recent years, some trends have emerged that pertain both to mobile devices and the Web. On one side, mobile devices have transitioned from being simple wireless phones to become ubiquitous Web-enabled users' companions. On the other side, the Web has evolved from an online one-size-fits-all collection of interlinked documents to become an open platform of personalised services and content. It will not be long before these trends will converge and create a Seamless Web: an integrated environment where, besides traditional services delivered by powerful server machines accessible via wide area networks, new services and content will be offered by users to users via their portable devices. As a result, mobile users will soon be exposed - in addition to traditional "on-line" Web services/content - to a parallel universe of pervasive "off-line" services provided by devices in their surroundings. Such circumstances will raise new challenges when it comes to selecting the services to rely on, that will require solutions grounded on the characteristics of mobile environments. Two aspects will require particular attention: first, users will have access to a countless multitude of services impossible to explore; they will need assistance to identify, among this multitude, those services they are most likely to enjoy. Secondly, if today's services (and their providers) are always-on, `static' and aiming at Five 9s availability, tomorrow's pervasive services will be mobile (as devices move), fine-grained, increasingly composite (to provide richer functionalities) and so more unreliable by nature. Our research tackles the problem of service discovery in pervasive environments in two ways: on one hand, we support personalised discovery by means of a mobile recommender system, easing the discovery of pervasive services appealing to end-users. On the other hand, we enable reliable discovery, by reasoning on the composite nature of pervasive services and the physical availability of their component providers. Overall, we provide a discovery method that enables 'better' pervasive services, where by 'better' we mean both `more interesting' to the user and 'more reliable'