44 research outputs found
NLP2Code: Code Snippet Content Assist via Natural Language Tasks
Developers increasingly take to the Internet for code snippets to integrate
into their programs. To save developers the time required to switch from their
development environments to a web browser in the quest for a suitable code
snippet, we introduce NLP2Code, a content assist for code snippets. Unlike
related tools, NLP2Code integrates directly into the source code editor and
provides developers with a content assist feature to close the vocabulary gap
between developers' needs and code snippet meta data. Our preliminary
evaluation of NLP2Code shows that the majority of invocations lead to code
snippets rated as helpful by users and that the tool is able to support a wide
range of tasks.Comment: tool demo video available at
https://www.youtube.com/watch?v=h-gaVYtCznI; to appear as a tool demo paper
at ICSME 2017 (https://icsme2017.github.io/
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
Mining and linking crowd-based software engineering how-to screencasts
In recent years, crowd-based content in the form of screencast videos has gained in popularity among software engineers. Screencasts are viewed and created for different purposes, such as a learning aid, being part of a software project’s documentation, or as a general knowledge sharing resource. For organizations to remain competitive in attracting and retaining their workforce, they must adapt to these technological and social changes in software engineering practices.
In this thesis, we propose a novel methodology for mining and integrating crowd-based multi- media content in existing workflows to help provide software engineers of different levels of experience and roles access to a documentation they are familiar with or prefer. As a result, we first aim to gain insights on how a user’s background and the task to be performed influence the use of certain documentation media. We focus on tutorial screencasts to identify their important information sources and provide insights on their usage, advantages, and disadvantages from a practitioner’s perspective. To that end, we conduct a survey of software engineers. We discuss how software engineers benefit from screencasts as well as challenges they face in using screencasts as project documentation.
Our survey results revealed that screencasts and question and answers sites are among the most popular crowd-based information sources used by software engineers. Also, the level of experience and the role or reason for resorting to a documentation source affects the types of documentation used by software engineers. The results of our survey support our motivation in this thesis and show that for screencasts, high quality content and a narrator are very important components for users.
Unfortunately, the binary format of videos makes analyzing video content difficult. As a result, dissecting and filtering multimedia information based on its relevance to a given project is an inherently difficult task. Therefore, it is necessary to provide automated approaches for mining and linking this crowd-based multimedia documentation to their relevant software artifacts. In this thesis, we apply LDA-based (Latent Dirichlet Allocation) mining approaches that take as input a set of screencast artifacts, such as GUI (Graphical User Interface) text (labels) and spoken words, to perform information extraction and, therefore, increase the availability of both textual and multimedia documentation for various stakeholders of a software product. For example, this allows screencasts to be linked to other software artifacts such as source code to help software developers/maintainers have access to the implementation details of an application feature.
We also present applications of our proposed methodology that include: 1) an LDA-based mining approach that extracts use case scenarios in text format from screencasts, 2) an LDA-based approach that links screencasts to their relevant artifacts (e.g., source code), and 3) a Semantic Web-based approach to establish direct links between vulnerability exploitation screencasts and their relevant vulnerability descriptions in the National Vulnerability Database (NVD) and indirectly link screencasts to their relevant Maven dependencies. To evaluate the applicability of the proposed approach, we report on empirical case studies conducted on existing screencasts that describe different use case scenarios of the WordPress and Firefox open source applications or vulnerability exploitation scenarios