42,073 research outputs found

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

    Towns conquer: a gamified application to collect geographical names (vernacular names/toponyms)

    Get PDF
    The traditional model for geospatial crowd sourcing asks the public to use their free time collecting geospatial data for no obvious reward. This model has shown to work very well on projects such as Open Street Map, but comes with some clear disadvantages such as reliance on small communities of ‘Neo-geographers’ and variability in quality and content of collected data. This project aims at tackling these problems by providing alternative motivation specifically a smartphone based computer game service. Geographical names (vernacular names/ toponyms) have been identified as potential targets as they are difficult to collect on a large scale and easy to collect locally, thus ideal for crowd sourcing. The data set will be a toponyms database provided by the Spanish National Geographic Institute (IGN Spain). A location based game is targeted as it is easy to guide data collection with in-game rewards (prizes, points, badges etc.). Android is chosen for its accessible API and wide use
    • …
    corecore