7 research outputs found
Structuring Wikipedia Articles with Section Recommendations
Sections are the building blocks of Wikipedia articles. They enhance
readability and can be used as a structured entry point for creating and
expanding articles. Structuring a new or already existing Wikipedia article
with sections is a hard task for humans, especially for newcomers or less
experienced editors, as it requires significant knowledge about how a
well-written article looks for each possible topic. Inspired by this need, the
present paper defines the problem of section recommendation for Wikipedia
articles and proposes several approaches for tackling it. Our systems can help
editors by recommending what sections to add to already existing or newly
created Wikipedia articles. Our basic paradigm is to generate recommendations
by sourcing sections from articles that are similar to the input article. We
explore several ways of defining similarity for this purpose (based on topic
modeling, collaborative filtering, and Wikipedia's category system). We use
both automatic and human evaluation approaches for assessing the performance of
our recommendation system, concluding that the category-based approach works
best, achieving precision@10 of about 80% in the human evaluation.Comment: SIGIR '18 camera-read
How to choose the most appropriate centrality measure?
We propose a new method to select the most appropriate network centrality
measure based on the user's opinion on how such a measure should work on a set
of simple graphs. The method consists in: (1) forming a set of
candidate measures; (2) generating a sequence of sufficiently simple graphs
that distinguish all measures in on some pairs of nodes; (3) compiling
a survey with questions on comparing the centrality of test nodes; (4)
completing this survey, which provides a centrality measure consistent with all
user responses. The developed algorithms make it possible to implement this
approach for any finite set of measures. This paper presents its
realization for a set of 40 centrality measures. The proposed method called
culling can be used for rapid analysis or combined with a normative approach by
compiling a survey on the subset of measures that satisfy certain normative
conditions (axioms). In the present study, the latter was done for the subsets
determined by the Self-consistency or Bridge axioms.Comment: 26 pages, 1 table, 1 algorithm, 8 figure
Between news and history: Identifying networked topics of collective attention on Wikipedia
The digital information landscape has introduced a new dimension to understanding how we collectively react to new information and preserve it at the societal level. This, together with the emergence of platforms such as Wikipedia, has challenged traditional views on the relationship between current events and historical accounts of events, with an ever-shrinking divide between "news" and "history". Wikipedia's place as the Internet's primary reference work thus poses the question of how it represents both traditional encyclopaedic knowledge and evolving important news stories. In other words, how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia. This work provides a means of distinguishing how these information and attention clusters are related to Wikipedia's twin faces of encyclopaedic knowledge and current events -- crucial to understanding the production and consumption of knowledge in the digital age
Cleansing Wikipedia categories using centrality
We propose a novel general technique aimed at pruning and cleansing the Wikipedia category hierarchy, with a tunable level of aggregation. Our approach is endogenous, since it does not use any information coming from Wikipedia articles, but it is based solely on the user-generated (noisy) Wikipedia category folksonomy itself. We show how the proposed techniques can help reduce the level of noise in the hierarchy and discuss how alternative centrality measures can differently impact on the result