102 research outputs found
Παραδοτέο Π.3.2: Μοντελοποίηση χρήστη και διαχείριση προτιμήσεων
Το παρόν παραδοτέο Π.3.2 περιλαμβάνει τα αποτελέσματα της υποδράσης ΥΔ3.2: Ρύθμιση της εξέλιξης υπερχώρων και οικοσυστημάτων πληροφορίας. Στην ενότητα 1 παρουσιάζουμε το πλαίσιο και τα κίνητρα της έρευνας μας, στην ενότητα 2 τρόπο μοντελοποίησης των προφίλ των χρηστών που προτείνουμε καθώς και μία εφαρμογή του συγκεκριμένου μοντέλου, στην ενότητα 3 τον τρόπο διαχείρισης των προφίλ και των αλγορίθμων που μπορούν να λειτουργήσουν κάτω από ένα τέτοιο μοντέλο, καθώς και την επιλεκτική αποδοτική επίλυση σημαντικών αλγορίθμων εξατομίκευσης
On optimality of jury selection in crowdsourcing
Recent advances in crowdsourcing technologies enable computationally challenging tasks (e.g., sentiment analysis and entity resolution) to be performed by Internet workers, driven mainly by monetary incentives. A fundamental question is: how should workers be selected, so that the tasks in hand can be accomplished successfully and economically? In this paper, we study the Jury Selection Problem (JSP): Given a monetary budget, and a set of decision-making tasks (e.g., “Is Bill Gates still the CEO of Microsoft now?”), return the set of workers (called jury), such that their answers yield the highest “Jury Quality” (or JQ). Existing JSP solutions make use of the Majority Voting (MV) strategy, which uses the answer chosen by the largest number of workers. We show that MV does not yield the best solution for JSP. We further prove that among all voting strategies (including deterministic and randomized strategies), Bayesian Voting (BV) can optimally solve JSP. We then examine how to solve JSP based on BV. This is technically challenging, since computing the JQ with BV is NP-hard. We solve this problem by proposing an approximate algorithm that is computationally efficient. Our approximate JQ computation algorithm is also highly accurate, and its error is proved to be bounded within 1%. We extend our solution by considering the task owner’s “belief” (or prior) on the answers of the tasks. Experiments on synthetic and real datasets show that our new approach is consistently better than the best JSP solution known.published_or_final_versio
Investigation of Database Models for Evolving Graphs
We deal with the efficient implementation of storage models for time-varying graphs. To this end, we present an improved approach for the HiNode vertex-centric model based on MongoDB. This approach, apart from its inherent space optimality, exhibits significant improvements in global query execution times, which is the most challenging query type for entity-centric approaches. Not only significant speedups are achieved but more expensive queries can be executed as well, when compared to an implementation based on Cassandra due to the capability to exploit indices to a larger extent and benefit from in-database query processing
Finite Automata Algorithms in Map-Reduce
In this thesis the intersection of several large nondeterministic finite automata (NFA's) as well as minimization of a large deterministic finite automaton (DFA) in map-reduce are studied. We have derived a lower bound on replication rate for computing NFA intersections and provided three concrete algorithms for the problem. Our investigation of the replication rate for each of all three algorithms shows where each algorithm could be applied through detailed experiments on large datasets of finite automata. Denoting n the number of states in DFA A, we propose an algorithm to minimize A in n map-reduce rounds in the worst-case. Our experiments, however, indicate that the number of rounds, in practice, is much smaller than n for all DFA's we examined. In other words, this algorithm converges in d iterations by computing the equivalence classes of each state, where d is the diameter of the input DFA
On the Parameterized Complexity of Learning Monadic Second-Order Formulas
Within the model-theoretic framework for supervised learning introduced by
Grohe and Tur\'an (TOCS 2004), we study the parameterized complexity of
learning concepts definable in monadic second-order logic (MSO). We show that
the problem of learning a consistent MSO-formula is fixed-parameter tractable
on structures of bounded tree-width and on graphs of bounded clique-width in
the 1-dimensional case, that is, if the instances are single vertices (and not
tuples of vertices). This generalizes previous results on strings and on trees.
Moreover, in the agnostic PAC-learning setting, we show that the result also
holds in higher dimensions. Finally, via a reduction to the MSO-model-checking
problem, we show that learning a consistent MSO-formula is para-NP-hard on
general structures
Characterizing XML Twig Queries with Examples
International audienceTypically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable
- …