Search CORE

3,070 research outputs found

Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

Author: Mansurov Bahodir
Morgan Jonathan
West Robert
Yazdanian Ramtin
Zia Leila
Publication venue
Publication date: 08/04/2019
Field of study

Every day, thousands of users sign up as new Wikipedia contributors. Once joined, these users have to decide which articles to contribute to, which users to seek out and learn from or collaborate with, etc. Any such task is a hard and potentially frustrating one given the sheer size of Wikipedia. Supporting newcomers in their first steps by recommending articles they would enjoy editing or editors they would enjoy collaborating with is thus a promising route toward converting them into long-term contributors. Standard recommender systems, however, rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines. By addressing the cold-start problem, this work can help with the sustainable growth and maintenance of Wikipedia's diverse editor community.Comment: Accepted at the 13th International AAAI Conference on Web and Social Media (ICWSM-2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Analysis of gas chromatography/mass spectrometry data for catalytic lignin depolymerization using positive matrix factorization

Author: Barrett JA
Ford PC
Foston MB
Gao Y
Harper DP
Hosseinaei O
Walker MJ
Williams BJ
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Various catalytic technologies are being developed to efficiently convert lignin into renewable chemicals. However, due to its complexity, catalytic lignin depolymerization often generates a wide and complex distribution of product compounds. Gas chromatography/mass spectrometry (GC-MS) is a common analytical technique to profile the compounds that comprise lignin depolymerization products. GC-MS is applied not only to determine the product composition, but also to develop an understanding of the catalytic reaction pathways and of the relationships among catalyst structure, reaction conditions, and the resulting compounds generated. Although a very useful tool, the analysis of lignin depolymerization products with GC-MS is limited by the quality and scope of the available mass spectral libraries and the ability to correlate changes in GC-MS chromatograms to changes in lignin structure, catalyst structure, and other reaction conditions. In this study, the GC-MS data of the depolymerization products generated from organosolv hybrid poplar lignin using a copper-doped porous metal oxide catalyst and a methanol/dimethyl carbonate co-solvent was analyzed by applying a factor analysis technique, positive matrix factorization (PMF). Several different solutions for the PMF model were explored. A 13-factor solution sufficiently explains the chemical changes occurring to lignin depolymerization products as a function of lignin, reaction time, catalyst, and solvent. Overall, seven factors were found to represent aromatic compounds, while one factor was defined by aliphatic compounds

eScholarship - University of California

Algorithms and Architecture for Real-time Recommendations at News UK

Author: Bailey Dion
Clarke Daoud
Pajak Tom
Rodriguez Carlos
Publication venue
Publication date: 15/09/2017
Field of study

Recommendation systems are recognised as being hugely important in industry, and the area is now well understood. At News UK, there is a requirement to be able to quickly generate recommendations for users on news items as they are published. However, little has been published about systems that can generate recommendations in response to changes in recommendable items and user behaviour in a very short space of time. In this paper we describe a new algorithm for updating collaborative filtering models incrementally, and demonstrate its effectiveness on clickstream data from The Times. We also describe the architecture that allows recommendations to be generated on the fly, and how we have made each component scalable. The system is currently being used in production at News UK.Comment: Accepted for presentation at AI-2017 Thirty-seventh SGAI International Conference on Artificial Intelligence. Cambridge, England 12-14 December 201

arXiv.org e-Print Archive

Crossref

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Author: Geambasu Roxana
Huang Tzu-Kuo
Lecuyer Mathias
Sen Siddhartha
Spahn Riley
Publication venue
Publication date: 21/05/2017
Field of study

Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

arXiv.org e-Print Archive

Crossref