37,376 research outputs found
Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora
Quora is one of the most popular community Q&A sites of recent times.
However, many question posts on this Q&A site often do not get answered. In
this paper, we quantify various linguistic activities that discriminates an
answered question from an unanswered one. Our central finding is that the way
users use language while writing the question text can be a very effective
means to characterize answerability. This characterization helps us to predict
early if a question remaining unanswered for a specific time period t will
eventually be answered or not and achieve an accuracy of 76.26% (t = 1 month)
and 68.33% (t = 3 months). Notably, features representing the language use
patterns of the users are most discriminative and alone account for an accuracy
of 74.18%. We also compare our method with some of the similar works (Dror et
al., Yang et al.) achieving a maximum improvement of ~39% in terms of accuracy.Comment: 1 figure, 3 tables, ICWSM 2017 as poste
Cultures in Community Question Answering
CQA services are collaborative platforms where users ask and answer
questions. We investigate the influence of national culture on people's online
questioning and answering behavior. For this, we analyzed a sample of 200
thousand users in Yahoo Answers from 67 countries. We measure empirically a set
of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert
Levine's Pace of Life and show that behavioral cultural differences exist in
community question answering platforms. We find that national cultures differ
in Yahoo Answers along a number of dimensions such as temporal predictability
of activities, contribution-related behavioral patterns, privacy concerns, and
power inequality.Comment: Published in the proceedings of the 26th ACM Conference on Hypertext
and Social Media (HT'15
The Social World of Content Abusers in Community Question Answering
Community-based question answering platforms can be rich sources of
information on a variety of specialized topics, from finance to cooking. The
usefulness of such platforms depends heavily on user contributions (questions
and answers), but also on respecting the community rules. As a crowd-sourced
service, such platforms rely on their users for monitoring and flagging content
that violates community rules.
Common wisdom is to eliminate the users who receive many flags. Our analysis
of a year of traces from a mature Q&A site shows that the number of flags does
not tell the full story: on one hand, users with many flags may still
contribute positively to the community. On the other hand, users who never get
flagged are found to violate community rules and get their accounts suspended.
This analysis, however, also shows that abusive users are betrayed by their
network properties: we find strong evidence of homophilous behavior and use
this finding to detect abusive users who go under the community radar. Based on
our empirical observations, we build a classifier that is able to detect
abusive users with an accuracy as high as 83%.Comment: Published in the proceedings of the 24th International World Wide Web
Conference (WWW 2015
Learning to predict closed questions on stack overflow
The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over
Determinants of quality, latency, and amount of Stack Overflow answers about recent Android APIs.
Stack Overflow is a popular crowdsourced question and answer website for programming-related issues. It is an invaluable resource for software developers; on average, questions posted there get answered in minutes to an hour. Questions about well established topics, e.g., the coercion operator in C++, or the difference between canonical and class names in Java, get asked often in one form or another, and answered very quickly. On the other hand, questions on previously unseen or niche topics take a while to get a good answer. This is particularly the case with questions about current updates to or the introduction of new application programming interfaces (APIs). In a hyper-competitive online market, getting good answers to current programming questions sooner could increase the chances of an app getting released and used. So, can developers anyhow, e.g., hasten the speed to good answers to questions about new APIs? Here, we empirically study Stack Overflow questions pertaining to new Android APIs and their associated answers. We contrast the interest in these questions, their answer quality, and timeliness of their answers to questions about old APIs. We find that Stack Overflow answerers in general prioritize with respect to currentness: questions about new APIs do get more answers, but good quality answers take longer. We also find that incentives in terms of question bounties, if used appropriately, can significantly shorten the time and increase answer quality. Interestingly, no operationalization of bounty amount shows significance in our models. In practice, our findings confirm the value of bounties in enhancing expert participation. In addition, they show that the Stack Overflow style of crowdsourcing, for all its glory in providing answers about established programming knowledge, is less effective with new API questions
The Size Conundrum: Why Online Knowledge Markets Can Fail at Scale
In this paper, we interpret the community question answering websites on the
StackExchange platform as knowledge markets, and analyze how and why these
markets can fail at scale. A knowledge market framing allows site operators to
reason about market failures, and to design policies to prevent them. Our goal
is to provide insights on large-scale knowledge market failures through an
interpretable model. We explore a set of interpretable economic production
models on a large empirical dataset to analyze the dynamics of content
generation in knowledge markets. Amongst these, the Cobb-Douglas model best
explains empirical data and provides an intuitive explanation for content
generation through concepts of elasticity and diminishing returns. Content
generation depends on user participation and also on how specific types of
content (e.g. answers) depends on other types (e.g. questions). We show that
these factors of content generation have constant elasticity---a percentage
increase in any of the inputs leads to a constant percentage increase in the
output. Furthermore, markets exhibit diminishing returns---the marginal output
decreases as the input is incrementally increased. Knowledge markets also vary
on their returns to scale---the increase in output resulting from a
proportionate increase in all inputs. Importantly, many knowledge markets
exhibit diseconomies of scale---measures of market health (e.g., the percentage
of questions with an accepted answer) decrease as a function of number of
participants. The implications of our work are two-fold: site operators ought
to design incentives as a function of system size (number of participants); the
market lens should shed insight into complex dependencies amongst different
content types and participant actions in general social networks.Comment: The 27th International Conference on World Wide Web (WWW), 201
Identifying Unclear Questions in Community Question Answering Websites
Thousands of complex natural language questions are submitted to community
question answering websites on a daily basis, rendering them as one of the most
important information sources these days. However, oftentimes submitted
questions are unclear and cannot be answered without further clarification
questions by expert community members. This study is the first to investigate
the complex task of classifying a question as clear or unclear, i.e., if it
requires further clarification. We construct a novel dataset and propose a
classification approach that is based on the notion of similar questions. This
approach is compared to state-of-the-art text classification baselines. Our
main finding is that the similar questions approach is a viable alternative
that can be used as a stepping stone towards the development of supportive user
interfaces for question formulation.Comment: Proceedings of the 41th European Conference on Information Retrieval
(ECIR '19), 201
The big five: Discovering linguistic characteristics that typify distinct personality traits across Yahoo! answers members
Indexación: Scopus.This work was partially supported by the project FONDECYT “Bridging the Gap between Askers and Answers in Community Question Answering Services” (11130094) funded by the Chilean Government.In psychology, it is widely believed that there are five big factors that determine the different personality traits: Extraversion, Agreeableness, Conscientiousness and Neuroticism as well as Openness. In the last years, researchers have started to examine how these factors are manifested across several social networks like Facebook and Twitter. However, to the best of our knowledge, other kinds of social networks such as social/informational question-answering communities (e.g., Yahoo! Answers) have been left unexplored. Therefore, this work explores several predictive models to automatically recognize these factors across Yahoo! Answers members. As a means of devising powerful generalizations, these models were combined with assorted linguistic features. Since we do not have access to ask community members to volunteer for taking the personality test, we built a study corpus by conducting a discourse analysis based on deconstructing the test into 112 adjectives. Our results reveal that it is plausible to lessen the dependency upon answered tests and that effective models across distinct factors are sharply different. Also, sentiment analysis and dependency parsing proven to be fundamental to deal with extraversion, agreeableness and conscientiousness. Furthermore, medium and low levels of neuroticism were found to be related to initial stages of depression and anxiety disorders. © 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved.https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/275
Dual Language and ENL Comprehension: A First Grade Study for Students at Risk for Delayed English Language Development
This research began by asking how dual language programming impacts English comprehension for ENL students. Research was conducted within one first grade dual language cohort with five bilingual students. The data was collected by interviewing teachers and students, utilizing historical comprehension data, observing read alouds, and assessing student comprehension. Findings revealed that comprehension in a participant’s first language was positively related to English comprehension. However, individual student differences impacted the extent of the correlation. Furthermore, dual language teachers implemented common instructional practices to scaffold ENL student comprehension. Therefore, the data implied that native language instruction is integral, student backgrounds and differences need to be analyzed, and dual language educators need adequate professional development to best aid ENL comprehension
- …