6,477 research outputs found
Reinventing the Wheel: Explaining Question Duplication in Question Answering Communities
Duplicate questions are common occurrences in Question Answering Communities (QACs) and impede the development of efficacious problem-solving communities. Yet, there is a dearth of research that has sought to shed light on the mechanisms underlying question duplication. Building on the information adoption model, we advance a research model that posits information quality and source credibility as factors deterring users from asking redundant questions within QACs. Furthermore, considering the question-answer dichotomy intrinsic to QACs, we distinguish the quality and credibility of questions from those of answers as distinctive inhibitors of question duplication. We empirically validate our hypotheses on a leading QAC platform by harnessing a deep learning algorithm to detect duplications on over 9,380,000 question pairs. Results revealed that while the credibility of both questions and answers could alleviate question duplication, visual and actionable elements are more effective in preventing question duplication by boosting the quality of questions and answers respectively
On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past
A big part of software developers’ time is spent finding answers to their coding-task-related questions. To answer their questions, developers usually perform web searches, ask questions on Q&A websites, or, more recently, in chat communities. Yet, many of these questions have frequently already been answered in previous chat conversations or other online communities. Automatically identifying and then suggesting these previous answers to the askers could, thus, save time and effort. In an empirical analysis, we first explored the frequency of repeating questions on the Discord chat platform and assessed our approach to identify them automatically. The approach was then evaluated with real-world developers in a field experiment, through which we received 142 ratings on the helpfulness of the suggestions we provided to help answer 277 questions that developers posted in four Discord communities. We further collected qualitative feedback through 53 surveys and 10 follow-up interviews. We found that the suggestions were considered helpful in 40% of the cases, that suggesting Stack Overflow posts is more often considered helpful than past Discord conversations, and that developers have difficulties describing their problems as search queries and, thus, prefer describing them as natural language questions in online communities
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
One goal of technical online communities is to help developers find the right
answer in one place. A single question can be asked in different ways with
different wordings, leading to the existence of duplicate posts on technical
forums. The question of how to discover and link duplicate posts has garnered
the attention of both developer communities and researchers. For example, Stack
Overflow adopts a voting-based mechanism to mark and close duplicate posts.
However, addressing these constantly emerging duplicate posts in a timely
manner continues to pose challenges. Therefore, various approaches have been
proposed to detect duplicate posts on technical forum posts automatically. The
existing methods suffer from limitations either due to their reliance on
handcrafted similarity metrics which can not sufficiently capture the semantics
of posts, or their lack of supervision to improve the performance.
Additionally, the efficiency of these methods is hindered by their dependence
on pair-wise feature generation, which can be impractical for large amount of
data. In this work, we attempt to employ and refine the GPT-3 embeddings for
the duplicate detection task. We assume that the GPT-3 embeddings can
accurately represent the semantics of the posts. In addition, by training a
Siamese-based network based on the GPT-3 embeddings, we obtain a latent
embedding that accurately captures the duplicate relation in technical forum
posts. Our experiment on a benchmark dataset confirms the effectiveness of our
approach and demonstrates superior performance compared to baseline methods.
When applied to the dataset we constructed with a recent Stack Overflow dump,
our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and
68.9%, respectively. With a manual study, we confirm our approach's potential
of finding unlabelled duplicates on technical forums.Comment: SANER 202
Gender Differences in Participation and Reward on Stack Overflow
Programming is a valuable skill in the labor market, making the
underrepresentation of women in computing an increasingly important issue.
Online question and answer platforms serve a dual purpose in this field: they
form a body of knowledge useful as a reference and learning tool, and they
provide opportunities for individuals to demonstrate credible, verifiable
expertise. Issues, such as male-oriented site design or overrepresentation of
men among the site's elite may therefore compound the issue of women's
underrepresentation in IT. In this paper we audit the differences in behavior
and outcomes between men and women on Stack Overflow, the most popular of these
Q&A sites. We observe significant differences in how men and women participate
in the platform and how successful they are. For example, the average woman has
roughly half of the reputation points, the primary measure of success on the
site, of the average man. Using an Oaxaca-Blinder decomposition, an econometric
technique commonly applied to analyze differences in wages between groups, we
find that most of the gap in success between men and women can be explained by
differences in their activity on the site and differences in how these
activities are rewarded. Specifically, 1) men give more answers than women and
2) are rewarded more for their answers on average, even when controlling for
possible confounders such as tenure or buy-in to the site. Women ask more
questions and gain more reward per question. We conclude with a hypothetical
redesign of the site's scoring system based on these behavioral differences,
cutting the reputation gap in half
- …