3,070 research outputs found
Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
Every day, thousands of users sign up as new Wikipedia contributors. Once
joined, these users have to decide which articles to contribute to, which users
to seek out and learn from or collaborate with, etc. Any such task is a hard
and potentially frustrating one given the sheer size of Wikipedia. Supporting
newcomers in their first steps by recommending articles they would enjoy
editing or editors they would enjoy collaborating with is thus a promising
route toward converting them into long-term contributors. Standard recommender
systems, however, rely on users' histories of previous interactions with the
platform. As such, these systems cannot make high-quality recommendations to
newcomers without any previous interactions -- the so-called cold-start
problem. The present paper addresses the cold-start problem on Wikipedia by
developing a method for automatically building short questionnaires that, when
completed by a newly registered Wikipedia user, can be used for a variety of
purposes, including article recommendations that can help new editors get
started. Our questionnaires are constructed based on the text of Wikipedia
articles as well as the history of contributions by the already onboarded
Wikipedia editors. We assess the quality of our questionnaire-based
recommendations in an offline evaluation using historical data, as well as an
online evaluation with hundreds of real Wikipedia newcomers, concluding that
our method provides cohesive, human-readable questions that perform well
against several baselines. By addressing the cold-start problem, this work can
help with the sustainable growth and maintenance of Wikipedia's diverse editor
community.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM-2019
Recommended from our members
Analysis of gas chromatography/mass spectrometry data for catalytic lignin depolymerization using positive matrix factorization
Various catalytic technologies are being developed to efficiently convert lignin into renewable chemicals. However, due to its complexity, catalytic lignin depolymerization often generates a wide and complex distribution of product compounds. Gas chromatography/mass spectrometry (GC-MS) is a common analytical technique to profile the compounds that comprise lignin depolymerization products. GC-MS is applied not only to determine the product composition, but also to develop an understanding of the catalytic reaction pathways and of the relationships among catalyst structure, reaction conditions, and the resulting compounds generated. Although a very useful tool, the analysis of lignin depolymerization products with GC-MS is limited by the quality and scope of the available mass spectral libraries and the ability to correlate changes in GC-MS chromatograms to changes in lignin structure, catalyst structure, and other reaction conditions. In this study, the GC-MS data of the depolymerization products generated from organosolv hybrid poplar lignin using a copper-doped porous metal oxide catalyst and a methanol/dimethyl carbonate co-solvent was analyzed by applying a factor analysis technique, positive matrix factorization (PMF). Several different solutions for the PMF model were explored. A 13-factor solution sufficiently explains the chemical changes occurring to lignin depolymerization products as a function of lignin, reaction time, catalyst, and solvent. Overall, seven factors were found to represent aromatic compounds, while one factor was defined by aliphatic compounds
Algorithms and Architecture for Real-time Recommendations at News UK
Recommendation systems are recognised as being hugely important in industry,
and the area is now well understood. At News UK, there is a requirement to be
able to quickly generate recommendations for users on news items as they are
published. However, little has been published about systems that can generate
recommendations in response to changes in recommendable items and user
behaviour in a very short space of time. In this paper we describe a new
algorithm for updating collaborative filtering models incrementally, and
demonstrate its effectiveness on clickstream data from The Times. We also
describe the architecture that allows recommendations to be generated on the
fly, and how we have made each component scalable. The system is currently
being used in production at News UK.Comment: Accepted for presentation at AI-2017 Thirty-seventh SGAI
International Conference on Artificial Intelligence. Cambridge, England 12-14
December 201
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
- …