370 research outputs found
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
Contributions Ă lâOptimisation de RequĂȘtes Multidimensionnelles
Analyser les donnĂ©es consiste Ă choisir un sous-ensemble des dimensions qui les dĂ©criventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intĂ©ressantes". L'analyse se transforme alors en une activitĂ© exploratoire oĂč chaque passe traduit par une requĂȘte. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requĂȘtes qui ont une vision globale du processus plutĂŽt que de chercher Ă optimiser chaque requĂȘteindĂ©pendamment les unes des autres. Nous prĂ©sentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requĂȘtes: (i) le calcul de bordures,(ii) les requĂȘtes dites OLAP (On Line Analytical Processing) dans les cubes de donnĂ©es et (iii) les requĂȘtesde prĂ©fĂ©rence type skyline
Graph-based Modelling of Concurrent Sequential Patterns
Structural relation patterns have been introduced recently to extend the search for complex patterns often hidden behind large sequences of data. This has motivated a novel approach to sequential patterns post-processing and a corresponding data mining method was proposed for Concurrent Sequential Patterns (ConSP). This article refines the approach in the context of ConSP modelling, where a companion graph-based model is devised as an extension of previous work. Two new modelling methods are presented here together with a construction algorithm, to complete the transformation of concurrent sequential patterns to a ConSP-Graph representation. Customer orders data is used to demonstrate the effectiveness of ConSP mining while synthetic sample data highlights the strength of the modelling technique, illuminating the theories developed
- âŠ