    Followers Are Not Enough: A Question-Oriented Approach to Community Detection in Online Social Networks

    Community detection in online social networks is typically based on the analysis of the explicit connections between users, such as "friends" on Facebook and "followers" on Twitter. But online users often have hundreds or even thousands of such connections, and many of these connections do not correspond to real friendships or more generally to accounts that users interact with. We claim that community detection in online social networks should be question-oriented and rely on additional information beyond the simple structure of the network. The concept of 'community' is very general, and different questions such as "whom do we interact with?" and "with whom do we share similar interests?" can lead to the discovery of different social groups. In this paper we focus on three types of communities beyond structural communities: activity-based, topic-based, and interaction-based. We analyze a Twitter dataset using three different weightings of the structural network meant to highlight these three community types, and then infer the communities associated with these weightings. We show that the communities obtained in the three weighted cases are highly different from each other, and from the communities obtained by considering only the unweighted structural network. Our results confirm that asking a precise question is an unavoidable first step in community detection in online social networks, and that different questions can lead to different insights about the network under study.Comment: 22 pages, 4 figures, 1 table

    Analysis of Retweeting Behavior Using Topic Models

    Igapäevase eluga põimunud virtuaalsed sotsiaalvõrgustikud omavad üha kasvavat rolli sotsiaalsetes ja ärilistes nähtustes. Microblogging teenused nagu Twitter mängivad olulist rolli Interneti infovahetuses, muutes võimalikuks sõnumite leviku minutitega. Käesolevas uurimuses analüüsitakse korduvalt edastatavate sõnumite (retweet) levikut Twitteris. Kasutades Latent Dirichlet Allocation mudelit teemade eristamiseks näitame, et kasutajate ja sõnumites sisalduvate teemade vaheline suhteline kaugus on lühem korduvalt edastatavatel sõnumitel. Kasutades otsustuspuid hindame teemapõhise retweet mudeli täpsust ja kasulikkust. Töö tulemusena näitame, et teemapõhine mudel on tugevama ennustusvõimega võrreldes baseline mudelitega, millest lähtuvalt väidame, et antud lähenemine on sobiv korduvalt edastavate sõnumite ennustamiseks ning edasiseks arenduseks.Social networks are nowadays a constant presence in our lives and increasingly have a role in important social and commercial phenomena. Microblogging services such as Twitter appear to play an important role in the process of information dissemination on the Internet making it possible for messages to spread virally in a matter of minutes. In this research work we study the mechanism of re-broadcasting (called “retweeting”) information on Twitter; specifically we use Latent Dirichlet Allocation to analyze users and messages in terms of the topics that compose their text bodies and by means of ANOVA we are able to show that the topical distance between users and messages is shorter for tweets that are retweeted than for those that are not. Using Decision Tree learning we build several models in order to assess the accuracy and usefulness of our topic-based model of retweeting. Our results show that our topic-based model slightly outperforms a baseline prediction measure, so we conclude that such model is indeed a valid option to consider for predicting retweet behavior with possibilities open for improvement