Search CORE

6,477 research outputs found

Reinventing the Wheel: Explaining Question Duplication in Question Answering Communities

Author: Cai Zhao
Li Yijing
Lim Eric
Liu Fei
Liu Xiaohui
Publication venue: AIS Electronic Library (AISeL)
Publication date: 11/11/2019
Field of study

Duplicate questions are common occurrences in Question Answering Communities (QACs) and impede the development of efficacious problem-solving communities. Yet, there is a dearth of research that has sought to shed light on the mechanisms underlying question duplication. Building on the information adoption model, we advance a research model that posits information quality and source credibility as factors deterring users from asking redundant questions within QACs. Furthermore, considering the question-answer dichotomy intrinsic to QACs, we distinguish the quality and credibility of questions from those of answers as distinctive inhibitors of question duplication. We empirically validate our hypotheses on a leading QAC platform by harnessing a deep learning algorithm to detect duplications on over 9,380,000 question pairs. Results revealed that while the credibility of both questions and answers could alleviate question duplication, visual and actionable elements are more effective in preventing question duplication by boosting the quality of questions and answers respectively

AIS Electronic Library (AISeL)

On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past

Author: Fritz Thomas
Lill Alexander
Meyer André N.
Publication venue: ACM Digital library
Publication date: 14/04/2024
Field of study

A big part of software developers’ time is spent finding answers to their coding-task-related questions. To answer their questions, developers usually perform web searches, ask questions on Q&A websites, or, more recently, in chat communities. Yet, many of these questions have frequently already been answered in previous chat conversations or other online communities. Automatically identifying and then suggesting these previous answers to the askers could, thus, save time and effort. In an empirical analysis, we first explored the frequency of repeating questions on the Discord chat platform and assessed our approach to identify them automatically. The approach was then evaluated with real-world developers in a field experiment, through which we received 142 ratings on the helpfulness of the suggestions we provided to help answer 277 questions that developers posted in four Discord communities. We further collected qualitative feedback through 53 surveys and 10 follow-up interviews. We found that the suggestions were considered helpful in 40% of the cases, that suggesting Stack Overflow posts is more often considered helpful than past Discord conversations, and that developers have difficulties describing their problems as search queries and, thus, prefer describing them as natural language questions in online communities

ZORA

Stack Overflow: A Code Laundering Platform?

Author: An Le
Antoniol Giuliano
Khomh Foutse
Mlouki Ons
Publication venue
Publication date: 01/01/2017
Field of study

Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER

arXiv.org e-Print Archive

Crossref

PolyPublie

Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection

Author: Khomh Foutse
Li Heng
Washizaki Hironori
Wu Xingfang
Yoshioka Nobukazu
Publication venue
Publication date: 04/03/2024
Field of study

One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.Comment: SANER 202

arXiv.org e-Print Archive

Gender Differences in Participation and Reward on Stack Overflow

Author: Hannak Aniko
May Anna
Wachs Johannes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/11/2018
Field of study

Programming is a valuable skill in the labor market, making the underrepresentation of women in computing an increasingly important issue. Online question and answer platforms serve a dual purpose in this field: they form a body of knowledge useful as a reference and learning tool, and they provide opportunities for individuals to demonstrate credible, verifiable expertise. Issues, such as male-oriented site design or overrepresentation of men among the site's elite may therefore compound the issue of women's underrepresentation in IT. In this paper we audit the differences in behavior and outcomes between men and women on Stack Overflow, the most popular of these Q&A sites. We observe significant differences in how men and women participate in the platform and how successful they are. For example, the average woman has roughly half of the reputation points, the primary measure of success on the site, of the average man. Using an Oaxaca-Blinder decomposition, an econometric technique commonly applied to analyze differences in wages between groups, we find that most of the gap in success between men and women can be explained by differences in their activity on the site and differences in how these activities are rewarded. Specifically, 1) men give more answers than women and 2) are rewarded more for their answers on average, even when controlling for possible confounders such as tenure or buy-in to the site. Women ask more questions and gain more reward per question. We conclude with a hypothetical redesign of the site's scoring system based on these behavioral differences, cutting the reputation gap in half

arXiv.org e-Print Archive

Elektronische Publikationen der Wirtschaftsuniversität Wien