66,765 research outputs found

    How is a data-driven approach better than random choice in label space division for multi-label classification?

    Full text link
    We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss

    Detecting semantic groups in MIP models

    Get PDF

    Parallel Processing of Large Graphs

    Full text link
    More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories.Comment: Preprint submitted to Future Generation Computer System
    • …
    corecore