Detecting coherent and well-connected communities inside large-scale graphs is an interesting problem that can provide useful insight into the graph structure and individual communities. It can also serve as the basis for content exploration and discovery within the graph. Clustering is a popular technique for community detection, however, the two main categories of clustering algorithms, i.e, global and local algorithms, have either scalability or usability issues, e.g, global algorithms do not scale well, and local algorithms may cover only a portion of the graph. Such one-stage algorithms typically optimize one objective function and do not work well in settings where we need to optimize various coverage, coherence and connectivity metrics. In this paper, we study large-scale community detection over a real-world graph composed of millions of YouTube videos. In particular, we present a multi-stage scalable clustering algorithm, combining a pre-processing stage, a local clustering stage, and a post-processing stage to generate clusters of YouTube videos with coherent content. We formalize coverage, coherence, and connectivity metrics and evaluate the quality of the proposed multi-stage clustering algorithms for YouTube videos. We also use extracted entities to attach meaningful labels to our clusters. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.