452 research outputs found

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Full text link
    Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France

    Personalizing XML Full Text Search in PIMENTO

    Get PDF
    In PIMENTO we advocate a novel approach to XML search that leverages user information to return more relevant query answers. This approach is based on formalizing {em user profiles} in terms of {em scoping rules} which are used to rewrite an input query, and of {em ordering rules} which are combined with query scoring to customize the ranking of query answers to specific users

    Exploration of User Groups in VEXUS

    Full text link
    We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of user data requires tools that help explorers, be they specialists or novice users, acquire new insights. VEXUS lets explorers interact with user data via visual primitives and builds an exploration profile to recommend the next exploration steps. VEXUS combines state-of-the-art visualization techniques with appropriate indexing of user data to provide fast and relevant exploration

    Distributed Evaluation of Top-k Temporal Joins

    No full text
    To appear in SIGMOD'16We study a particular kind of join, coined Ranked Temporal Join (RTJ), featuring predicates that compare time intervals and a scoring function associated with each predicate to quantify how well it is satisfied. RTJ queries are prevalent in a variety of applications such as network traffic monitoring , task scheduling, and tweet analysis. RTJ queries are often best interpreted as top-k queries where only the best matches are returned. We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. TKIJ relies on an offline statistics computation that, given a time partitioning into granules, computes the distribution of intervals' endpoints in each granule, and an online computation that generates query-dependent score bounds. Those statistics are used for workload assignment to reducers. This aims at reducing data replication, to limit I/O cost. Additionally , high-scoring results are distributed evenly to enable each reducer to prune unnecessary results. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data

    Crowd4U: An Initiative for Constructing an Open Academic Crowdsourcing Network

    Get PDF
    International audienceWe describe the Crowd4U initiative, which aims at constructing an all-academic open and generic platform for microvolunteering and crowdsourcing worldwide. Crowd4U provides a microtask-based platform in which most workers are volunteers at universities and other research institutions. Crowd4U is open in the sense that the platform can interact with other platforms, researchers can register their tasks, and the underlying code is not a black box. It is generic as it allows to register virtually any task. Crowd4U has already been used by several projects for public and academic purposes

    Profile Diversity for Phenotyping Data Search and Recommendation

    No full text
    Session: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score
    • …
    corecore