61 research outputs found
Holistic recommender systems for software engineering
The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator's work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International
Conference on Evaluation and Assessment in Software Engineering, 201
Retrieving curated Stack Overflow Posts of similar project tasks
Software development depends on diverse technologies and methods and as a result, software development teams often handle issues in which team members are not experts. In order to address this lack of expertise, developers typically search for information on web-based Q&A sites such as Stack Overflow, a well-known place to find solutions to specific technology-related problems. Access to these web-based Q&A locations is currently not integrated into the software development environment, and since the associations between software development projects and the supporting sources of known solutions, usually referred to as knowledge, is not explicitly recorded, software developers often need to search for solutions to similar recurring issues multiple times. This lack of integration hinders the reuse of the knowledge obtained, besides not avoiding efforts of search and selection, curation, of this knowledge over and over again. This research aims at proposing a study regarding explicitly associating project elements (such as project tasks) to Stack Overflow posts that have already been curated by developers, and presents a study about Stack Overflow posts suggestions to developers based on similarity of project tasks.O desenvolvimento de software depende de diversas tecnologias e mĂ©todos e, como resultado, as equipes de desenvolvimento de software geralmente lidam com problemas em que nĂŁo sĂŁo especialistas. Para lidar com a falta de conhecimento, desenvolvedores normalmente procuram informações em sites de perguntas e respostas, como o Stack Overflow, um site usado para encontrar soluções para problemas especĂficos relacionados Ă tecnologia. O acesso a esses sites nĂŁo Ă© integrado ao ambiente de desenvolvimento de software e porque as associações entre os projetos de desenvolvimento de software e as fontes de suporte de soluções conhecidas nĂŁo sĂŁo explicitamente registradas. Com isso, desenvolvedores de software podem investir um esforço em procurar soluções para problemas semelhantes várias vezes. Essa falta de integração dificulta o reuso do conhecimento obtido, alĂ©m de nĂŁo evitar esforços de busca e seleção, a curadoria, repetidas vezes. Esta pesquisa tem como objetivo realizar um estudo sobre a associação explicita entre elementos do projeto (como tarefas de projeto) a publicações do Stack Overflow que já sofreram curadoria por desenvolvedores, e apresenta um estudo sobre sugestões de publicações do Stack Overflow a desenvolvedores com base na similaridade de tarefas de projeto
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
A Behavior-Driven Recommendation System for Stack Overflow Posts
Developers are often tasked with maintaining complex systems. Regardless of prior experience, there will inevitably be times in which they must interact with parts of the system with which they are unfamiliar. In such cases, recommendation systems may serve as a valuable tool to assist the developer in implementing a solution. Many recommendation systems in software engineering utilize the Stack Overflow knowledge-base as the basis of forming their recommendations. Traditionally, these systems have relied on the developer to explicitly invoke them, typically in the form of specifying a query. However, there may be cases in which the developer is in need of a recommendation but unaware that their need exists. A new class of recommendation systems deemed Behavior-Driven Recommendation Systems for Software Engineering seeks to address this issue by relying on developer behavior to determine when a recommendation is needed, and once such a determination is made, formulate a search query based on the software engineering task context. This thesis presents one such system, StackInTheFlow, a plug-in integrating into the IntelliJ family of Java IDEs. StackInTheFlow allows the user to intervi act with it as a traditional recommendation system, manually specifying queries and browsing returned Stack Overflow posts. However, it also provides facilities for detecting when the developer is in need of a recommendation, defined when the developer has encountered an error messages or a difficulty detection model based on indicators of developer progress is fired. Once such a determination has been made, a query formulation model constructed based on a periodic data dump of Stack Overflow posts will automatically form a query from the software engineering task context extracted from source code currently open within the IDE. StackInTheFlow also provides mechanisms to personalize, over time, the results displayed to a specific set of Stack Overflow tags based on the results previously selected by the user. The effectiveness of these mechanisms are examined and results based the collection of anonymous user logs and a small scale study are presented. Based on the results of these evaluations, it was found that some of the queries issued by the tool are effective, however there are limitations regarding the extraction of the appropriate context of the software engineering task yet to overcome
- …