4,205 research outputs found
Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
Every day, thousands of users sign up as new Wikipedia contributors. Once
joined, these users have to decide which articles to contribute to, which users
to seek out and learn from or collaborate with, etc. Any such task is a hard
and potentially frustrating one given the sheer size of Wikipedia. Supporting
newcomers in their first steps by recommending articles they would enjoy
editing or editors they would enjoy collaborating with is thus a promising
route toward converting them into long-term contributors. Standard recommender
systems, however, rely on users' histories of previous interactions with the
platform. As such, these systems cannot make high-quality recommendations to
newcomers without any previous interactions -- the so-called cold-start
problem. The present paper addresses the cold-start problem on Wikipedia by
developing a method for automatically building short questionnaires that, when
completed by a newly registered Wikipedia user, can be used for a variety of
purposes, including article recommendations that can help new editors get
started. Our questionnaires are constructed based on the text of Wikipedia
articles as well as the history of contributions by the already onboarded
Wikipedia editors. We assess the quality of our questionnaire-based
recommendations in an offline evaluation using historical data, as well as an
online evaluation with hundreds of real Wikipedia newcomers, concluding that
our method provides cohesive, human-readable questions that perform well
against several baselines. By addressing the cold-start problem, this work can
help with the sustainable growth and maintenance of Wikipedia's diverse editor
community.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM-2019
Structuring Wikipedia Articles with Section Recommendations
Sections are the building blocks of Wikipedia articles. They enhance
readability and can be used as a structured entry point for creating and
expanding articles. Structuring a new or already existing Wikipedia article
with sections is a hard task for humans, especially for newcomers or less
experienced editors, as it requires significant knowledge about how a
well-written article looks for each possible topic. Inspired by this need, the
present paper defines the problem of section recommendation for Wikipedia
articles and proposes several approaches for tackling it. Our systems can help
editors by recommending what sections to add to already existing or newly
created Wikipedia articles. Our basic paradigm is to generate recommendations
by sourcing sections from articles that are similar to the input article. We
explore several ways of defining similarity for this purpose (based on topic
modeling, collaborative filtering, and Wikipedia's category system). We use
both automatic and human evaluation approaches for assessing the performance of
our recommendation system, concluding that the category-based approach works
best, achieving precision@10 of about 80% in the human evaluation.Comment: SIGIR '18 camera-read
Network Structure, Efficiency, and Performance in WikiProjects
The internet has enabled collaborations at a scale never before possible, but
the best practices for organizing such large collaborations are still not
clear. Wikipedia is a visible and successful example of such a collaboration
which might offer insight into what makes large-scale, decentralized
collaborations successful. We analyze the relationship between the structural
properties of WikiProject coeditor networks and the performance and efficiency
of those projects. We confirm the existence of an overall
performance-efficiency trade-off, while observing that some projects are higher
than others in both performance and efficiency, suggesting the existence
factors correlating positively with both. Namely, we find an association
between low-degree coeditor networks and both high performance and high
efficiency. We also confirm results seen in previous numerical and small-scale
lab studies: higher performance with less skewed node distributions, and higher
performance with shorter path lengths. We use agent-based models to explore
possible mechanisms for degree-dependent performance and efficiency. We present
a novel local-majority learning strategy designed to satisfy properties of
real-world collaborations. The local-majority strategy as well as a localized
conformity-based strategy both show degree-dependent performance and
efficiency, but in opposite directions, suggesting that these factors depend on
both network structure and learning strategy. Our results suggest possible
benefits to decentralized collaborations made of smaller, more tightly-knit
teams, and that these benefits may be modulated by the particular learning
strategies in use.Comment: 11 pages, 5 figures, to appear in ICWSM 201
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
Assessing Post Usage for Measuring the Quality of Forum Posts
It has become difficult to discover quality content within forums websites due to the increasing amount of UserGenerated Content (UGC) on the Web. Many existing websites have relied on their users to explicitly rate content quality. The main problem with this approach is that the majority of content often receives insufficient rating. Current automated content rating solutions have evaluated linguistic features of UGC but are less effective for different types of online communities. We propose a novel approach that assesses post usage to measure the quality of forum posts. Post usage can be viewed as implicit user ratings derived from their usage behaviour. The proposed model is validated against an operational forum using Matthews Correlation Coefficient to measure performance. Our model serves as a basis of exploring content usage to measure content quality in forums and other Web 2.0 platforms
Trust and Reputation for Successful Software Self-Organisation
Abstract An increasing number of dynamic software evolution approaches is com- monly based on integrating or utilising new pieces of software. This requires reso- lution of issues such as ensuring awareness of newly available software pieces and selection of most appropriate software pieces to use. Other chapters in this book dis- cuss dynamic software evolution focusing primarily on awareness, integration and utilisation of new software pieces, paying less attention on how selection among different software pieces is made. The selection issue is quite important since in the increasingly dynamic software world quite a few new software pieces occur over time, some of which being of lower utility, lower quality or even potentially harmful and malicious (for example, a new piece of software may contain hidden spyware or it may be a virus). In this chapter, we describe how computational trust and reputation can be used to avoid choosing new pieces of software that may be malicious or of lower quality. We start by describing computational models of trust and reputation and subsequently we apply them in two application domains. Firstly, in quality assessment of open source software, discussing the case where different trustors have different understandings of trust and trust estimation methods. Sec- ondly, in protection of open collaborative software, such as Wikipedia
Predicting Engagement in Video Lectures
The explosion of Open Educational Resources (OERs) in the recent years
creates the demand for scalable, automatic approaches to process and evaluate
OERs, with the end goal of identifying and recommending the most suitable
educational materials for learners. We focus on building models to find the
characteristics and features involved in context-agnostic engagement (i.e.
population-based), a seldom researched topic compared to other contextualised
and personalised approaches that focus more on individual learner engagement.
Learner engagement, is arguably a more reliable measure than popularity/number
of views, is more abundant than user ratings and has also been shown to be a
crucial component in achieving learning outcomes. In this work, we explore the
idea of building a predictive model for population-based engagement in
education. We introduce a novel, large dataset of video lectures for predicting
context-agnostic engagement and propose both cross-modal and modality-specific
feature sets to achieve this task. We further test different strategies for
quantifying learner engagement signals. We demonstrate the use of our approach
in the case of data scarcity. Additionally, we perform a sensitivity analysis
of the best performing model, which shows promising performance and can be
easily integrated into an educational recommender system for OERs.Comment: In Proceedings of International Conference on Educational Data Mining
202
From Discourse Structure To Text Specificity: Studies Of Coherence Preferences
To successfully communicate through text, a writer needs to organize information into an understandable and well-structured discourse for the targeted audience. This involves deciding when to convey general statements, when to elaborate on details, and gauging how much details to convey, i.e., the level of specificity. This thesis explores the automatic prediction of text specificity, and whether the perception of specificity varies across different audiences.
We characterize text specificity from two aspects: the instantiation discourse relation, and the specificity of sentences and words. We identify characteristics of instantiation that signify a change of specificity between sentences. Features derived from these characteristics substantially improve the detection of the relation. Using instantiation sentences as the basis for training, we propose a semi-supervised system to predict sentence specificity with speed and accuracy. Furthermore, we present insights into the effect of underspecified words and phrases on the comprehension of text, and the prediction of such words.
We show distinct preferences in specificity and discourse structure among different audiences. We investigate these distinctions in both cross-lingual and monolingual context. Cross-lingually, we identify discourse factors that significantly impact the quality of text translated from Chinese to English. Notably, a large portion of Chinese sentences are significantly more specific and need to be translated into multiple English sentences. We introduce a system using rich syntactic features to accurately detect such sentences. We also show that simplified text is more general, and that specific sentences are more likely to need simplification. Finally, we present evidence that the perception of sentence specificity differs among male and female readers
From Review to Rating: Exploring Dependency Measures for Text Classification
Various text analysis techniques exist, which attempt to uncover unstructured
information from text. In this work, we explore using statistical dependence
measures for textual classification, representing text as word vectors. Student
satisfaction scores on a 3-point scale and their free text comments written
about university subjects are used as the dataset. We have compared two textual
representations: a frequency word representation and term frequency
relationship to word vectors, and found that word vectors provide a greater
accuracy. However, these word vectors have a large number of features which
aggravates the burden of computational complexity. Thus, we explored using a
non-linear dependency measure for feature selection by maximizing the
dependence between the text reviews and corresponding scores. Our quantitative
and qualitative analysis on a student satisfaction dataset shows that our
approach achieves comparable accuracy to the full feature vector, while being
an order of magnitude faster in testing. These text analysis and feature
reduction techniques can be used for other textual data applications such as
sentiment analysis.Comment: 8 page
- …