Search CORE

1,339 research outputs found

Mining Knowledge Bases for Question & Answers Websites

Author: Lima Eduardo Coelho de
Publication venue: RIT Scholar Works
Publication date: 17/05/2016
Field of study

We studied the problem of searching answers for questions on a Question-and-Answer Website from knowledge bases. A number of research efforts had been developed using Stack Overflow data, which is available for the public. Surprisingly, only a few papers tried to improve the search for better answers. Furthermore, current approaches for searching a Question-and-Answer Website are usually limited to the question database, which is usually the website own content. We showed it is feasible to use knowledge bases as sources for answers. We implemented both vector-space and topic-space representations for our datasets and compared these distinct techniques. Finally, we proposed a hybrid ranking approach that took advantage of a machine-learned classifier to incorporate the tag information into the ranking and showed that it was able to improve the retrieval performance

RIT Scholar Works

Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case Study of Stack Overflow

Author: Mondal Saikat
Publication venue: 'University of Saskatchewan Library'
Publication date: 30/10/2020
Field of study

Technical question and answer (Q&A) websites have changed how developers seek information on the web and become more popular due to the shortcomings in official documentation and alternative knowledge sharing resources. Stack Overflow (SO) is one of the largest and most popular online Q&A websites for developers where they can share knowledge by answering questions and learn new skills by asking questions. Unfortunately, a large number of questions (up to 29%) are not answered at all, which might hurt the quality or purpose of this community-oriented knowledge base. In this thesis, we first attempt to detect the potentially unanswered questions during their submission using machine learning models. We compare unanswered and answered questions quantitatively and qualitatively. The quantitative analysis suggests that topics discussed in the question, the experience of the question submitter, and readability of question texts could often determine whether a question would be answered or not. Our qualitative study also reveals why the questions remain unanswered that could guide novice users to improve their questions. During analyzing the questions of SO, we see that many of them remain unanswered and unresolved because they contain such code segments that could potentially have programming issues (e.g., error, unexpected behavior); unfortunately, the issues could always not be reproduced by other users. This irreproducibility of issues might prevent questions of SO from getting answers or appropriate answers. In our second study, we thus conduct an exploratory study on the reproducibility of the issues discussed in questions and the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues. However, users can improve the quality of questions and answers by editing. Unfortunately, such edits may be rejected (i.e., rollback) due to undesired modifications and ambiguities. We thus offer a comprehensive overview of reasons and ambiguities in the SO rollback edits. We identify 14 reasons for rollback edits and eight ambiguities that are often present in those edits. We also develop algorithms to detect ambiguities automatically. During the above studies, we find that about half of the questions that received working solutions have negative scores. About 18\% of the accepted answers also do not score the maximum votes. Furthermore, many users are complaining against the downvotes that are cast to their questions and answers. All these findings cast serious doubts on the reliability of the evaluation mechanism employed at SO. We thus concentrate on the assessment mechanism of SO to ensure a non-biased, reliable quality assessment mechanism of SO. This study compares the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. We also develop machine learning models to classify the promoted and discouraged questions and predict them during their submission time. We believe that the findings from our studies and proposed techniques have the potential to (1) help the users to ask better questions with appropriate code examples, and (2) improve the editing and assessment mechanism of SO to promote better content quality

University of Saskatchewan Research Archive

Automatic Prediction of Rejected Edits in Stack Overflow

Author: Mondal Saikat
Roy Chanchal
Uddin Gias
Publication venue
Publication date: 06/10/2022
Field of study

The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scenario frustrates or demotivates users who would like to conduct good-quality edits. Therefore, our research focuses on assisting SO users by offering them suggestions on how to improve their editing of posts. First, we manually investigate 764 (382 questions + 382 answers) rejected edits by rollbacks and produce a catalog of 19 rejection reasons. Second, we extract 15 texts and user-based features to capture those rejection reasons. Third, we develop four machine learning models using those features. Our best-performing model can predict rejected edits with 69.1% precision, 71.2% recall, 70.1% F1-score, and 69.8% overall accuracy. Fourth, we introduce an online tool named EditEx that works with the SO edit system. EditEx can assist users while editing posts by suggesting the potential causes of rejections. We recruit 20 participants to assess the effectiveness of EditEx. Half of the participants (i.e., treatment group) use EditEx and another half (i.e., control group) use the SO standard edit system to edit posts. According to our experiment, EditEx can support SO standard edit system to prevent 49% of rejected edits, including the commonly rejected ones. However, it can prevent 12% rejections even in free-form regular edits. The treatment group finds the potential rejection reasons identified by EditEx influential. Furthermore, the median workload suggesting edits using EditEx is half compared to the SO edit system.Comment: Accepted for publication in Empirical Software Engineering (EMSE) journa

arXiv.org e-Print Archive

Understanding the Role of Images on Stack Overflow

Author: Hata Hideaki
Kamei Yasutaka
Kula Raula Gaikovina
Treude Christoph
Wang Dong
Xiao Tao
Publication venue
Publication date: 27/03/2023
Field of study

Images are increasingly being shared by software developers in diverse channels including question-and-answer forums like Stack Overflow. Although prior work has pointed out that these images are meaningful and provide complementary information compared to their associated text, how images are used to support questions is empirically unknown. To address this knowledge gap, in this paper we specifically conduct an empirical study to investigate (I) the characteristics of images, (II) the extent to which images are used in different question types, and (III) the role of images on receiving answers. Our results first show that user interface is the most common image content and undesired output is the most frequent purpose for sharing images. Moreover, these images essentially facilitate the understanding of 68% of sampled questions. Second, we find that discrepancy questions are more relatively frequent compared to those without images, but there are no significant differences observed in description length in all types of questions. Third, the quantitative results statistically validate that questions with images are more likely to receive accepted answers, but do not speed up the time to receive answers. Our work demonstrates the crucial role that images play by approaching the topic from a new angle and lays the foundation for future opportunities to use images to assist in tasks like generating questions and identifying question-relatedness

arXiv.org e-Print Archive

Are Code Examples on an Online Q&A Forum Reliable?

Author: Kim Miryung
Rajan Hridesh
Reinhardt Anastasia
Upadhyaya Ganesha
Zhang Tianyi
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using ExampleCheck and find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks. Such API misuse is caused by three main reasons---missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new approach to augment Stack Overflow with alternative API usage details that are not typically shown in curated examples

Digital Repository @ Iowa State University (ISU)

Crossref

A Near-Infrared Spectroscopic Study of the Accreting Magnetic White Dwarf SDSS J121209.31+013627.7 and its Substellar Companion

Author: D. W. Hoard
Hirst P.
Howell S. B.
J. Farihi
Jeans J. H.
M. R. Burleigh
Publication venue: 'University of Chicago Press'
Publication date: 25/10/2007
Field of study

The nature of the excess near-infrared emission associated with the magnetic white dwarf commonly known as SDSS 1212 is investigated primarily through spectroscopy, and also via photometry. The inferred low mass secondary in this system has been previously detected by the emission and variation of H

\alpha

, and the

1-2.5

\mu

m spectral data presented here are consistent with the presence of a late L or early T dwarf. The excess flux seen beyond 1.5

\mu

m in the phase-averaged spectrum is adequately modeled with an L8 dwarf substellar companion and cyclotron emission in a 7 MG magnetic field. This interesting system manifests several observational properties typical of polars, and is most likely an old interacting binary with a magnetic white dwarf and a substellar donor in an extended low state.Comment: 28 pages, 5 figures, Accepted to Ap

arXiv.org e-Print Archive

Crossref

UCL Discovery

Caltech Authors

Insights into software development approaches: mining Q &A repositories

Author: Akbar Muhammad Azeem
Fahmideh Mahdi
Khan Arif Ali
Khan Javed Ali
Zhou Peng
Publication venue
Publication date: 29/02/2024
Field of study

© 2023 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/Context: Software practitioners adopt approaches like DevOps, Scrum, and Waterfall for high-quality software development. However, limited research has been conducted on exploring software development approaches concerning practitioners’ discussions on Q &A forums. Objective: We conducted an empirical study to analyze developers’ discussions on Q &A forums to gain insights into software development approaches in practice. Method: We analyzed 13,903 developers’ posts across Stack Overflow (SO), Software Engineering Stack Exchange (SESE), and Project Management Stack Exchange (PMSE) forums. A mixed method approach, consisting of the topic modeling technique (i.e., Latent Dirichlet Allocation (LDA)) and qualitative analysis, is used to identify frequently discussed topics of software development approaches, trends (popular, difficult topics), and the challenges faced by practitioners in adopting different software development approaches. Findings: We identified 15 frequently mentioned software development approaches topics on Q &A sites and observed an increase in trends for the top-3 most difficult topics requiring more attention. Finally, our study identified 49 challenges faced by practitioners while deploying various software development approaches, and we subsequently created a thematic map to represent these findings. Conclusions: The study findings serve as a useful resource for practitioners to overcome challenges, stay informed about current trends, and ultimately improve the quality of software products they develop.Peer reviewe

University of Hertfordshire Research Archive

Embedded Emotion-based Classification of Stack Overflow Questions Towards the Question Quality Prediction

Author: Amit Kumar
Chanchal K Roy
Masudur Rahman
Mondal Mohammad
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-Software developers often ask questions in Stack Overflow Q & A site, and their posted questions sometimes do not meet the standard guidelines. As a consequence, some of the questions are edited by expert users, some of them are down-voted, or some are even deleted permanently. Besides, the users (i.e., developers) might not get the expected solutions for their problems. In this paper, we study up-voted and down-voted questions from Stack Overflow, and analyze the relationship of embedded emotions with question quality. We use Sentiment140 API for identifying embedded emotions in the question texts, and then apply Feed-Forward Multilayer Perceptron (MLP) and Support Vector Machine (SVM) on the emotion data for developing a quality prediction model. Experiments using 38,920 Stack Overflow questions suggest about 70% precision and about 74% recall for our model with 10-fold cross-validation, and these findings clearly reveal the impact of human emotions upon the quality of a question

CiteSeerX