Search CORE

3 research outputs found

WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

Author: Consonni Cristian
Laniado David
Montresor Alberto
Publication venue
Publication date: 04/04/2019
Field of study

Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the

9

largest language editions. The dataset contains yearly snapshots of the network and spans

17

years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Semantic Unlink Prediction in Evolving Social Networks through Probabilistic Description Logic

Author: Armada de Oliveira Marcius
Cerqueira Revoredo Kate
Ochoa Luna José Eduardo
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date
Field of study

Recently, prediction of new links between two individuals in social networks has gained a lot of attention. However, to fully understand and predict how the network evolves through time, ending relationships also need to be predicted. Although most approaches use graph-based methods for link prediction, these may not be suited for the unlink prediction task. In this paper, we propose an approach for unlink prediction that uses information about the domain of discourse through a probabilistic ontology, specified in the probabilistic description logic CRALC. We empirically evaluated our approach comparing it with standard graph-based and some state of the art unlink methods. The results shows significant improvement on detecting unlinks when considering our proposal. © 2014 IEEE.Trabajo de investigació

Repositorio Institucional Universidad Católica San Pablo