2,800 research outputs found
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
Formal verification of a software countermeasure against instruction skip attacks
Fault attacks against embedded circuits enabled to define many new attack
paths against secure circuits. Every attack path relies on a specific fault
model which defines the type of faults that the attacker can perform. On
embedded processors, a fault model consisting in an assembly instruction skip
can be very useful for an attacker and has been obtained by using several fault
injection means. To avoid this threat, some countermeasure schemes which rely
on temporal redundancy have been proposed. Nevertheless, double fault injection
in a long enough time interval is practical and can bypass those countermeasure
schemes. Some fine-grained countermeasure schemes have also been proposed for
specific instructions. However, to the best of our knowledge, no approach that
enables to secure a generic assembly program in order to make it fault-tolerant
to instruction skip attacks has been formally proven yet. In this paper, we
provide a fault-tolerant replacement sequence for almost all the instructions
of the Thumb-2 instruction set and provide a formal verification for this fault
tolerance. This simple transformation enables to add a reasonably good security
level to an embedded program and makes practical fault injection attacks much
harder to achieve
Recommended from our members
Beyond Similar Code: Leveraging Social Coding Websites
Programmers often write code with similarity to existing code written somewhere. Code search tools can help developers find similar solutions and identify possible improvements. For code search tools, good search results rely on valid data collection. Social coding websites, such as Question & Answer forum Stack Overflow (SO) and project repository GitHub, are popular destinations when programmers look for how to achieve certain programming tasks. Over the years, SO and GitHub have accumulated an enormous knowledge base of, and around, code. Since these software artifacts are publicly available, it is possible to leverage them in code search tools. This dissertation explores the opportunities of leveraging software artifacts from the social coding websites in searching for not just similar, but related, code. Programmers query SO and GitHub extensively to search for suitable code for reuse, however, not much is known about the usability or quality of the available code from each website. This dissertation first investigates under what circumstances the software artifacts found in social coding websites can be leveraged for purposes other than their immediate use by developers. It points out a number of problems that need to be addressed before those artifacts can be leveraged for code search and development tools. Specifically, triviality, fragility, and duplication, dominate these artifacts. However, when these problems are addressed, there is still a considerable amount of good quality artifacts that can be leveraged.SO and GitHub are not only two separate data resources, moreover, they together, belong to a larger system of software development process: the same users that rely on facilities of GitHub often seeks support on SO for their problems, and return to GitHub to apply the knowledge acquired. This dissertation further studies the crossover of software artifacts between SO and GitHub, and categorizes the adaptations from a SO code snippet to its GitHub counterparts. Existing search tools only recommend other code locations that are syntactically or semantically similar to the given code but do not reason about other kinds of relevant code that a developer should also pay attention to, e.g., auxiliary code to accomplish a complete task. With the good quality software artifacts and crossover between the two systems available, this dissertation presents two approaches that leverage these artifacts in searching for related code. Aroma indexes GitHub projects, takes a partial code snippet as input, searches the corpus for methods containing the partial code snippet, and clusters and intersects the results of the search to recommend. Aroma is evaluated on randomly selected queries created from the GitHub corpus, as well as queries derived from SO code snippets. It recommends related code for error checking and handling, objects configuring, etc. Furthermore, a user study is conducted where industrial developers are asked to complete programming tasks using Aroma and provide feedback. The results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently. CodeAid reuses the crossover between SO and GitHub and recommends related code outside of a method body. For each SO snippet as a query, CodeAid retrieves the co-occurring code fragments for its GitHub counterparts and clusters them to recommend common ones. 74% of the common co-occurring code fragments represent related functionality that should be included in code search results. Three major types of relevancy--complementary, supplementary, and alternative methods, are identified
Automatic Prediction of Rejected Edits in Stack Overflow
The content quality of shared knowledge in Stack Overflow (SO) is crucial in
supporting software developers with their programming problems. Thus, SO allows
its users to suggest edits to improve the quality of a post (i.e., question and
answer). However, existing research shows that many suggested edits in SO are
rejected due to undesired contents/formats or violating edit guidelines. Such a
scenario frustrates or demotivates users who would like to conduct good-quality
edits. Therefore, our research focuses on assisting SO users by offering them
suggestions on how to improve their editing of posts. First, we manually
investigate 764 (382 questions + 382 answers) rejected edits by rollbacks and
produce a catalog of 19 rejection reasons. Second, we extract 15 texts and
user-based features to capture those rejection reasons. Third, we develop four
machine learning models using those features. Our best-performing model can
predict rejected edits with 69.1% precision, 71.2% recall, 70.1% F1-score, and
69.8% overall accuracy. Fourth, we introduce an online tool named EditEx that
works with the SO edit system. EditEx can assist users while editing posts by
suggesting the potential causes of rejections. We recruit 20 participants to
assess the effectiveness of EditEx. Half of the participants (i.e., treatment
group) use EditEx and another half (i.e., control group) use the SO standard
edit system to edit posts. According to our experiment, EditEx can support SO
standard edit system to prevent 49% of rejected edits, including the commonly
rejected ones. However, it can prevent 12% rejections even in free-form regular
edits. The treatment group finds the potential rejection reasons identified by
EditEx influential. Furthermore, the median workload suggesting edits using
EditEx is half compared to the SO edit system.Comment: Accepted for publication in Empirical Software Engineering (EMSE)
journa
- …