7 research outputs found

    Distributed Evaluation of Top-k Temporal Joins

    No full text
    To appear in SIGMOD'16We study a particular kind of join, coined Ranked Temporal Join (RTJ), featuring predicates that compare time intervals and a scoring function associated with each predicate to quantify how well it is satisfied. RTJ queries are prevalent in a variety of applications such as network traffic monitoring , task scheduling, and tweet analysis. RTJ queries are often best interpreted as top-k queries where only the best matches are returned. We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. TKIJ relies on an offline statistics computation that, given a time partitioning into granules, computes the distribution of intervals' endpoints in each granule, and an online computation that generates query-dependent score bounds. Those statistics are used for workload assignment to reducers. This aims at reducing data replication, to limit I/O cost. Additionally , high-scoring results are distributed evenly to enable each reducer to prune unnecessary results. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data

    Algorithmes pour le monitoring de traces d'activité à grande échelle

    No full text
    In this thesis, we study scalable algorithms for monitoring activity traces. In several domains, monitoring is a key ability to extract value from data and improve a system. This thesis aims to design algorithms for monitoring two kinds of activity traces. First, we investigate temporal data monitoring. We introduce a new kind of interval join, that features scoring functions reflecting the degree of satisfaction of temporal predicates. We study these joins in the context of batch processing: we formalize Ranked Temporal Join (RTJ), that combine collections of intervals and return the k best results. We show how to exploit the nature of temporal predicates and the properties of their associated scored semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data. We also propose a preliminary study to extend our work on TKIJ to stream processing. Second, we investigate monitoring in crowdsourcing. We advocate the need to incorporate motivation in task assignment. We propose to study an adaptive approach, that captures workers’ motivation during task completion and use it to revise task assignment accordingly across iterations. We study two variants of motivation-aware task assignment: Individual Task Assignment (Ita) and Holistic Task Assignment (Hta). First, we investigate Ita, where we assign tasks to workers individually, one worker at a time. We model Ita and show it is NP-Hard. We design three task assignment strategies that exploit various objectives. Our live experiments study the impact of each strategy on overall performance. We find that different strategies prevail for different performance dimensions. In particular, the strategy that assigns random and relevant tasks offers the best task throughput and the strategy that assigns tasks that best match a worker’s compromise between task diversity and task payment has the best outcome quality. Our experiments confirm the need for adaptive motivation-aware task assignment. Then, we study Hta, where we assign tasks to all available workers, holistically. We model Hta and show it is both NP-Hard and MaxSNP-Hard. We develop efficient approximation algorithms with provable guarantees. We conduct offline experiments to verify the efficiency of our algorithms. We also conduct online experiments with real workers and compare our approach with various non-adaptive assignment strategies. We find that our approach offers the best compromise between performance dimensions thereby assessing the need for adaptability.Dans cette thèse, nous étudions des algorithmes pour le monitoring des traces d’activité à grande échelle. Le monitoring est une aptitude clé dans plusieurs domaines, permettant d’extraire de la valeur des données ou d’améliorer les performances d’un système. Nous explorons d’abord le monitoring de données temporelles. Nous présentons un nouveau type de jointure sur des intervalles, qui inclut des fonctions de score caractérisant le degré de satisfaction de prédicats temporels. Nous étudions ces jointures dans le contexte du batch processing (traitement par lots). Nous formalisons la Ranked Temporal Join (RTJ), une jointure qui combine des collections d’intervalles et retourne les k meilleurs résultats. Nous montrons comment exploiter les propriétés des prédicats temporels et de la sémantique de score associée afin de concevoir TKIJ , une méthode d’évaluation de requête distribuée basée sur Map-Reduce. Nos expériences sur des données synthétiques et réelles montrent que TKIJ est plus performant que les techniques de l’état de l’art et démontre de bonnes performances sur des requêtes RTJ n-aires sur des données temporelles. Nous proposons également une étude préliminaire afin d’étendre nos travaux sur TKIJ au domaine du stream processing (traitement de flots). Nous explorons ensuite le monitoring dans le crowdsourcing (production participative). Nous soutenons la nécessité d’intégrer la motivation des travailleurs dans le processus d’affectation des tâches. Nous proposons d’étudier une approche adaptative, qui évalue la motivation des travailleurs lors de l’exécution des tâches et l’exploite afin d’améliorer l’affectation de tâches qui est réalisée de manière itérative. Nous explorons une première variante nommée Individual Task Assignment (Ita), dans laquelle les tâches sont affectées individuellement, un travailleur à la fois. Nous modélisons Ita et montrons que ce problème est NP-Difficile. Nous proposons trois méthodes d’affectation de tâches qui poursuivent différents objectifs. Nos expériences en ligne étudient l’impact de chaque méthode sur la performance globale dans l’exécution de tâches. Nous observons que différentes stratégies sont dominantes sur les différentes dimensions de performance. En particulier, la méthode affectant des tâches aléatoires et correspondant aux intérêts d’un travailleur donne le meilleur flux d’exécution de tâches. La méthode affectant des tâches correspondant au compromis d’un travailleur entre diversité et niveau de rémunération des tâches donne le meilleur niveau de qualité. Nos expériences confirment l’utilité d’une affectation de tâches adaptative et tenant compte de la motivation. Nous étudions une deuxième variante nommée Holistic Task Assignment (Hta), où les tâches sont affectées à tous les travailleurs disponibles, de manière holistique. Nous modélisons Hta et montrons que ce problème est NP-Difficile et MaxSNP-Difficile. Nous développons des algorithmes d’approximation pour Hta. Nous menons des expériences sur des données synthétiques pour évaluer l’efficacité de nos algorithmes. Nous conduisons également des expériences en ligne et comparons notre approche avec d’autres stratégies non adaptatives. Nous observons que notre approche présente le meilleur compromis sur les différentes dimensions de performance

    Motivation-Aware Task Assignment in Crowdsourcing

    No full text
    International audienceWe investigate how to leverage the notion of motivation in assigning tasks to workers and improving the performance of a crowdsourcing system. In particular, we propose to model motivation as the balance between task diversity–i.e., the difference in skills among the tasks to complete, and task payment–i.e., the difference between how much a chosen task offers to pay and how much other available tasks pay. We propose to test different task assignment strategies: (1) relevance, a strategy that assigns matching tasks, i.e., those that fit a worker's profile, (2) diversity, a strategy that chooses matching and diverse tasks, and (3) div-pay, a strategy that selects matching tasks that offer the best compromise between diversity and payment. For each strategy , we study multiple iterations where tasks are reassigned to workers as their motivation evolves. At each iteration, relevance and diversity assign tasks to a worker from an available pool of filtered tasks. div-pay, on the other hand, estimates each worker's motivation on-the-fly at each iteration, and uses it to assign tasks to the worker. Our empirical experiments study the impact of each strategy on overall performance. We examine both requester-centric and worker-centric performance dimensions and find that different strategies prevail for different dimensions. In particular, relevance offers the best task throughput while div-pay achieves the best outcome quality

    Task Relevance and Diversity as Worker Motivation in Crowdsourcing

    No full text
    International audienc

    Task Composition in Crowdsourcing

    No full text
    International audienceCrowdsourcing has gained popularity in a variety of domains as an increasing number of jobs are " taskified " and completed independently by a set of workers. A central process in crowdsourcing is the mechanism through which workers find tasks. On popular platforms such as Amazon Mechanical Turk, tasks can be sorted by dimensions such as creation date or reward amount. Research efforts on task assignment have focused on adopting a requester-centric approach whereby tasks are proposed to workers in order to maximize overall task throughput, result quality and cost. In this paper, we advocate the need to complement that with a worker-centric approach to task assignment, and examine the problem of producing, for each worker, a personalized summary of tasks that preserves overall task throughput. We formalize task composition for workers as an optimization problem that finds a representative set of k valid and relevant Composite Tasks (CTs). Validity enforces that a composite task complies with the task arrival rate and satisfies the worker's expected wage. Relevance imposes that tasks match the worker's qualifications. We show empirically that workers' experience is greatly improved due to task homogeneity in each CT and to the adequation of CTs with workers' skills. As a result task throughput is improved
    corecore