2 research outputs found

    Structural–temporal embedding of large-scale dynamic networks with parallel implementation

    No full text
    Due to the widespread network data in the real world, network analysis has attracted increasing attention in recent years. In complex systems such as social networks, entities and their mutual relations can be respectively represented by nodes and edges composing a network. Because occurrences of entities and relations in these systems are often dynamic over time, their networks are called temporal networks describing the process of dynamic connection of nodes in the networks. Dynamic network embedding aims to embed nodes in a temporal network into a low-dimensional semantic space, such that the network structures and evolution patterns can be preserved as much as possible in the latent space. Most existing methods capture structural similarities (relations) of strongly-connected nodes based on their historical neighborhood information, they ignore the structural similarities of weakly-connected nodes that may also represent relations and include no explicit temporal information in node embeddings for capturing periodic dependency of events. To address these issues, we propose a novel temporal network embedding model by extending the structure similarity to cover both strong connections and weak connections among nodes, and including the temporal information in node embeddings. To improve the training efficiency of our model, we present a parallel training strategy to quickly acquire node embeddings. Extensive experiments on several real-world temporal networks demonstrate that our model significantly outperforms the state-of-the-arts in traditional tasks, including link prediction and node classification

    A longitudinal study of topic classification on Twitter

    No full text
    Twitter represents a massively distributed information source over topics ranging from social and political events to entertainment and sports news. While recent work has suggested this content can be narrowed down to the personalized interests of individual users by training topic filters using standard classifiers, there remain many open questions about the efficacy of such classification-based filtering approaches. For example, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? In addition, how robust is a topic classifier over the time horizon, e.g., can a model trained in 1 year be used for making predictions in the subsequent year? Furthermore, what features, feature classes, and feature attributes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 diverse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that: (i) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training though performance degrades with time, (ii) the classes of hashtags and simple terms contain the most informative feature instances, (iii) removing tweets containing training hashtags from the validation set allows better generalization, and (iv) the simple volume of tweets by a user correlates more with their informativeness than their follower or friend count. In summary, this work provides a long-term study of topic classifiers on Twitter that further justifies classification-based topical filtering approaches while providing detailed insight into the feature properties most critical for topic classifier performance
    corecore