4,453 research outputs found
Predictability of extreme events in social media
It is part of our daily social-media experience that seemingly ordinary items
(videos, news, publications, etc.) unexpectedly gain an enormous amount of
attention. Here we investigate how unexpected these events are. We propose a
method that, given some information on the items, quantifies the predictability
of events, i.e., the potential of identifying in advance the most successful
items defined as the upper bound for the quality of any prediction based on the
same information. Applying this method to different data, ranging from views in
YouTube videos to posts in Usenet discussion groups, we invariantly find that
the predictability increases for the most extreme events. This indicates that,
despite the inherently stochastic collective dynamics of users, efficient
prediction is possible for the most extreme events.Comment: 13 pages, 3 figure
Determinants of quality, latency, and amount of Stack Overflow answers about recent Android APIs.
Stack Overflow is a popular crowdsourced question and answer website for programming-related issues. It is an invaluable resource for software developers; on average, questions posted there get answered in minutes to an hour. Questions about well established topics, e.g., the coercion operator in C++, or the difference between canonical and class names in Java, get asked often in one form or another, and answered very quickly. On the other hand, questions on previously unseen or niche topics take a while to get a good answer. This is particularly the case with questions about current updates to or the introduction of new application programming interfaces (APIs). In a hyper-competitive online market, getting good answers to current programming questions sooner could increase the chances of an app getting released and used. So, can developers anyhow, e.g., hasten the speed to good answers to questions about new APIs? Here, we empirically study Stack Overflow questions pertaining to new Android APIs and their associated answers. We contrast the interest in these questions, their answer quality, and timeliness of their answers to questions about old APIs. We find that Stack Overflow answerers in general prioritize with respect to currentness: questions about new APIs do get more answers, but good quality answers take longer. We also find that incentives in terms of question bounties, if used appropriately, can significantly shorten the time and increase answer quality. Interestingly, no operationalization of bounty amount shows significance in our models. In practice, our findings confirm the value of bounties in enhancing expert participation. In addition, they show that the Stack Overflow style of crowdsourcing, for all its glory in providing answers about established programming knowledge, is less effective with new API questions
Repairing Deep Neural Networks: Fix Patterns and Challenges
Significant interest in applying Deep Neural Network (DNN) has fueled the
need to support engineering of software that uses DNNs. Repairing software that
uses DNNs is one such unmistakable SE need where automated tools could be
beneficial; however, we do not fully understand challenges to repairing and
patterns that are utilized when manually repairing DNNs. What challenges should
automated repair tools address? What are the repair patterns whose automation
could help developers? Which repair patterns should be assigned a higher
priority for building automated bug repair tools? This work presents a
comprehensive study of bug fix patterns to address these questions. We have
studied 415 repairs from Stack overflow and 555 repairs from Github for five
popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to
understand challenges in repairs and bug repair patterns. Our key findings
reveal that DNN bug fix patterns are distinctive compared to traditional bug
fix patterns; the most common bug fix patterns are fixing data dimension and
neural network connectivity; DNN bug fixes have the potential to introduce
adversarial vulnerabilities; DNN bug fixes frequently introduce new bugs; and
DNN bug localization, reuse of trained model, and coping with frequent releases
are major challenges faced by developers when fixing bugs. We also contribute a
benchmark of 667 DNN (bug, repair) instances
Towards understanding the challenges faced by machine learning software developers and enabling automated solutions
Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case Study of Stack Overflow
Technical question and answer (Q&A) websites have changed how developers seek information on the web and become more popular due to the shortcomings in official documentation and alternative knowledge sharing resources. Stack Overflow (SO) is one of the largest and most popular online Q&A websites for developers where they can share knowledge by answering questions and learn new skills by asking questions. Unfortunately, a large number of questions (up to 29%) are not answered at all, which might hurt the quality or purpose of this community-oriented knowledge base. In this thesis, we first attempt to detect the potentially unanswered questions during their submission using machine learning models. We compare unanswered and answered questions quantitatively and qualitatively. The quantitative analysis suggests that topics discussed in the question, the experience of the question submitter, and readability of question texts could often determine whether a question would be answered or not. Our qualitative study also reveals why the questions remain unanswered that could guide novice users to improve their questions. During analyzing the questions of SO, we see that many of them remain unanswered and unresolved because they contain such code segments that could potentially have programming issues (e.g., error, unexpected behavior); unfortunately, the issues could always not be reproduced by other users. This irreproducibility of issues might prevent questions of SO from getting answers or appropriate answers. In our second study, we thus conduct an exploratory study on the reproducibility of the issues discussed in questions and the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues. However, users can improve the quality of questions and answers by editing. Unfortunately, such edits may be rejected (i.e., rollback) due to undesired modifications and ambiguities. We thus offer a comprehensive overview of reasons and ambiguities in the SO rollback edits. We identify 14 reasons for rollback edits and eight ambiguities that are often present in those edits. We also develop algorithms to detect ambiguities automatically. During the above studies, we find that about half of the questions that received working solutions have negative scores. About 18\% of the accepted answers also do not score the maximum votes. Furthermore, many users are complaining against the downvotes that are cast to their questions and answers. All these findings cast serious doubts on the reliability of the evaluation mechanism employed at SO. We thus concentrate on the assessment mechanism of SO to ensure a non-biased, reliable quality assessment mechanism of SO. This study compares the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. We also develop machine learning models to classify the promoted and discouraged questions and predict them during their submission time.
We believe that the findings from our studies and proposed techniques have the potential to (1) help the users to ask better questions with appropriate code examples, and (2) improve the editing and assessment mechanism of SO to promote better content quality
Towards the detection and analysis of performance regression introducing code changes
In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes
Applying Hierarchical Tag-Topic Models to Stack Overflow
Stack Overflow is a question and answer site for programming questions. It has become one of the most widely used resources for programmers, with many programmers accessing the site multiple times per day. A threat to the continued success of Stack Overflow is the ability to efficiently search the site. Existing research suggests that the inability to find certain questions results inunanswered questions, long delays in answering questions, or questions which are unable to be found by future visitors to the site. Further research suggests that questions with poor tag quality are particularly vulnerable to these issues.In this thesis, two approaches are considered for improving tag quality and search efficiency: automatic tag recommendations for question authors, and organizing the existing set of tags in a hierarchy from general to specific for Stack Overflow readers. A hierarchical organization is proposed for it\u27s ability to assist exploratory searches of the site.L2H, a hierarchical tag topic model, is a particularly interesting solution to these approaches because it can address both approaches with the same model. L2H is evaluated in detail on several proposed evaluation criteria to gauge it\u27s fitness for addressing these search challenges on Stack Overflow
Image-based Communication on Social Coding Platforms
Visual content in the form of images and videos has taken over
general-purpose social networks in a variety of ways, streamlining and
enriching online communications. We are interested to understand if and to what
extent the use of images is popular and helpful in social coding platforms. We
mined nine years of data from two popular software developers' platforms: the
Mozilla issue tracking system, i.e., Bugzilla, and the most well-known platform
for developers' Q/A, i.e., Stack Overflow. We further triangulated and extended
our mining results by performing a survey with 168 software developers. We
observed that, between 2013 and 2022, the number of posts containing image data
on Bugzilla and Stack Overflow doubled. Furthermore, we found that sharing
images makes other developers engage more and faster with the content. In the
majority of cases in which an image is included in a developer's post, the
information in that image is complementary to the text provided. Finally, our
results showed that when an image is shared, understanding the content without
the information in the image is unlikely for 86.9\% of the cases. Based on
these observations, we discuss the importance of considering visual content
when analyzing developers and designing automation tools
Features that Predict the Acceptability of Java and JavaScript Answers on Stack Overflow
Context: It is not uncommon for a new team member to join an existing Agile
software development team, even after development has started. This new team
member faces a number of challenges before they are integrated into the team
and can contribute productively to team progress. Ideally, each newcomer should
be supported in this transition through an effective team onboarding program,
although prior evidence suggests that this is challenging for many
organisations. Objective: We seek to understand how Agile teams address the
challenge of team onboarding in order to inform future onboarding design.
Method: We conducted an interview survey of eleven participants from eight
organisations to investigate what onboarding activities are common across Agile
software development teams. We also identify common goals of onboarding from a
synthesis of literature. A repertory grid instrument is used to map the
contributions of onboarding techniques to onboarding goals. Results: Our study
reveals that a broad range of team onboarding techniques, both formal and
informal, are used in practice. It also shows that particular techniques that
have high contributions to a given goal or set of goals. Conclusions: In
presenting a set of onboarding goals to consider and an evidence-based
mechanism for selecting techniques to achieve the desired goals it is expected
that this study will contribute to better-informed onboarding design and
planning. An increase in practitioner awareness of the options for supporting
new team members is also an expected outcome.Comment: Conference, 11 pages, 3 figures, 2 table
A Computational Model of Trust Based on Dynamic Interaction in the Stack Overflow Community
A member’s reputation in an online community is a quantified representation of their trustworthiness within the community. Reputation is calculated using rules-based algorithms which are primarily tied to the upvotes or downvotes a member receives on posts. The main drawback of this form of reputation calculation is the inability to consider dynamic factors such as a member’s activity (or inactivity) within the community. The research involves the construction of dynamic mathematical models to calculate reputation and then determine to what extent these results compare with rules-based models. This research begins with exploratory research of the existing corpus of knowledge. Constructive research in the building of mathematical dynamic models and then empirical research to determine the effectiveness of the models. Data collected from the Stack Overflow (SO) database is used by models to calculate a rule-based and dynamic member reputation and then using statistical correlation testing methods (i.e., Pearson and Spearman) to determine the extent of the relationship. Statistically significant results with moderate relationship size were found from correlation testing between rules-based and dynamic temporal models. The significance of the research and its conclusion that dynamic and temporal models can indeed produce results comparative to that of subjective vote-based systems is important in the context of building trust in online communities. Developing models to determine reputation in online communities based upon member post and comment activity avoids the potential drawbacks associated with vote-based reputation systems
- …