Search CORE

7 research outputs found

Structuring Wikipedia Articles with Section Recommendations

Author: Catasta Michele
Piccardi Tiziano
West Robert
Zia Leila
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2018
Field of study

Sections are the building blocks of Wikipedia articles. They enhance readability and can be used as a structured entry point for creating and expanding articles. Structuring a new or already existing Wikipedia article with sections is a hard task for humans, especially for newcomers or less experienced editors, as it requires significant knowledge about how a well-written article looks for each possible topic. Inspired by this need, the present paper defines the problem of section recommendation for Wikipedia articles and proposes several approaches for tackling it. Our systems can help editors by recommending what sections to add to already existing or newly created Wikipedia articles. Our basic paradigm is to generate recommendations by sourcing sections from articles that are similar to the input article. We explore several ways of defining similarity for this purpose (based on topic modeling, collaborative filtering, and Wikipedia's category system). We use both automatic and human evaluation approaches for assessing the performance of our recommendation system, concluding that the category-based approach works best, achieving precision@10 of about 80% in the human evaluation.Comment: SIGIR '18 camera-read

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

How to choose the most appropriate centrality measure?

Author: Chebotarev Pavel
Gubanov Dmitry
Publication venue
Publication date: 21/03/2020
Field of study

We propose a new method to select the most appropriate network centrality measure based on the user's opinion on how such a measure should work on a set of simple graphs. The method consists in: (1) forming a set

\cal F

of candidate measures; (2) generating a sequence of sufficiently simple graphs that distinguish all measures in

\cal F

on some pairs of nodes; (3) compiling a survey with questions on comparing the centrality of test nodes; (4) completing this survey, which provides a centrality measure consistent with all user responses. The developed algorithms make it possible to implement this approach for any finite set

\cal F

of measures. This paper presents its realization for a set of 40 centrality measures. The proposed method called culling can be used for rapid analysis or combined with a normative approach by compiling a survey on the subset of measures that satisfy certain normative conditions (axioms). In the present study, the latter was done for the subsets determined by the Self-consistency or Bridge axioms.Comment: 26 pages, 1 table, 1 algorithm, 8 figure

arXiv.org e-Print Archive

Between news and history: Identifying networked topics of collective attention on Wikipedia

Author: Gildersleve Patrick
Lambiotte Renaud
Yasseri Taha
Publication venue
Publication date: 14/11/2022
Field of study

The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history". Wikipedia's place as the Internet's primary reference work thus poses the question of how it represents both traditional encyclopaedic knowledge and evolving important news stories. In other words, how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia. This work provides a means of distinguishing how these information and attention clusters are related to Wikipedia's twin faces of encyclopaedic knowledge and current events -- crucial to understanding the production and consumption of knowledge in the digital age

Oxford University Research Archive

Cleansing Wikipedia categories using centrality

Author: C. Monti
P. Boldi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We propose a novel general technique aimed at pruning and cleansing the Wikipedia category hierarchy, with a tunable level of aggregation. Our approach is endogenous, since it does not use any information coming from Wikipedia articles, but it is based solely on the user-generated (noisy) Wikipedia category folksonomy itself. We show how the proposed techniques can help reduce the level of noise in the hierarchy and discuss how alternative centrality measures can differently impact on the result

AIR Universita degli studi di Milano