Search CORE

10 research outputs found

NLP2Code: Code Snippet Content Assist via Natural Language Tasks

Author: Campbell Brock Angus
Treude Christoph
Publication venue
Publication date: 02/08/2017
Field of study

Developers increasingly take to the Internet for code snippets to integrate into their programs. To save developers the time required to switch from their development environments to a web browser in the quest for a suitable code snippet, we introduce NLP2Code, a content assist for code snippets. Unlike related tools, NLP2Code integrates directly into the source code editor and provides developers with a content assist feature to close the vocabulary gap between developers' needs and code snippet meta data. Our preliminary evaluation of NLP2Code shows that the majority of invocations lead to code snippets rated as helpful by users and that the tool is able to support a wide range of tasks.Comment: tool demo video available at https://www.youtube.com/watch?v=h-gaVYtCznI; to appear as a tool demo paper at ICSME 2017 (https://icsme2017.github.io/

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

A document based traceability model for test management

Author: Ami Azri
Publication venue
Publication date: 01/01/2017
Field of study

Software testing has became more complicated in the emergence of distributed network, real-time environment, third party software enablers and the need to test system at multiple integration levels. These scenarios have created more concern over the quality of software testing. The quality of software has been deteriorating due to inefficient and ineffective testing activities. One of the main flaws is due to ineffective use of test management to manage software documentations. In documentations, it is difficult to detect and trace bugs in some related documents of which traceability is the major concern. Currently, various studies have been conducted on test management, however very few have focused on document traceability in particular to support the error propagation with respect to documentation. The objective of this thesis is to develop a new traceability model that integrates software engineering documents to support test management. The artefacts refer to requirements, design, source code, test description and test result. The proposed model managed to tackle software traceability in both forward and backward propagations by implementing multi-bidirectional pointer. This platform enabled the test manager to navigate and capture a set of related artefacts to support test management process. A new prototype was developed to facilitate observation of software traceability on all related artefacts across the entire documentation lifecycle. The proposed model was then applied to a case study of a finished software development project with a complete set of software documents called the On-Board Automobile (OBA). The proposed model was evaluated qualitatively and quantitatively using the feature analysis, precision and recall, and expert validation. The evaluation results proved that the proposed model and its prototype were justified and significant to support test management

Universiti Teknologi Malaysia Institutional Repository

Analyzing the Predictability of Source Code and its Application in Creating Parallel Corpora for English-to-Code Statistical Machine Translation

Author: Rahman Musfiqur
Publication venue
Publication date: 23/03/2018
Field of study

Analyzing source code using computational linguistics and exploiting the linguistic properties of source code have recently become popular topics in the domain of software engineering. In the first part of the thesis, we study the predictability of source code and determine how well source code can be represented using language models developed for natural language processing. In the second part, we study how well English discussions of source code can be aligned with code elements to create parallel corpora for English-to-code statistical machine translation. This work is organized as a “manuscript” thesis whereby each core chapter constitutes a submitted paper. The first part replicates recent works that have concluded that software is more repetitive and predictable, i.e. more natural, than English texts. We find that much of the apparent “naturalness” is artificial and is the result of language specific tokens. For example, the syntax of a language, especially the separators e.g., semi-colons and brackets, make up for 59% of all uses of Java tokens in our corpus. Furthermore, 40% of all 2-grams end in a separator, implying that a model for autocompleting the next token, would have a trivial separator as top suggestion 40% of the time. By using the standard NLP practice of eliminating punctuation (e.g., separators) and stopwords (e.g., keywords) we find that code is less repetitive and predictable than was suggested by previous work. We replicate this result across 7 programming languages. Continuing this work, we find that unlike the code written for a particular project, API code usage is similar across projects. For example a file is opened and closed in the same manner irrespective of domain. When we restrict our n-grams to those contained in the Java API we find that the entropy for 2-grams is significantly lower than the English corpus. This repetition perhaps explains the successful literature on API usage suggestion and autocompletion. We then study the impact of the representation of code on repetition. The n-gram model assumes that the current token can be predicted by the sequence of n previous tokens. When we extract program graphs of size 2, 3, and 4 nodes we see that the abstract graph representation is much more concise and repetitive than the n-gram representations of the same code. This suggests that future work should focus on graphs that include control and data flow dependencies and not linear sequences of tokens. The second part of this thesis focuses cleaning English and code corpora to aid in machine translation. Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. We clean StackOverflow, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpora. We contrast three data cleaning approaches: standard NLP, title only, and software task. We evaluate the quality of each corpus for MT. We measure the corpus size, percentage of unique tokens, and per-word maximum likelihood alignment entropy. While many works have shown that code is repetitive and predictable, we find that English discussions of code are also repetitive. Creating a maximum likelihood MT model, we find that English words map to a small number of specific code elements which partially explains the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available

Concordia University Research Repository

Extracting development tasks to navigate software documentation

Author: Dagenais B.
Robillard M.
Treude C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Knowledge management plays a central role in many software development organizations. While much of the important technical knowledge can be captured in documentation, there often exists a gap between the information needs of software developers and the documentation structure. To help developers navigate documentation, we developed a technique for automatically extracting tasks from software documentation by conceptualizing tasks as specific programming actions that have been described in the documentation. More than 70 percent of the tasks we extracted from the documentation of two projects were judged meaningful by at least one of two developers. We present TaskNavigator, a user interface for search queries that suggests tasks extracted with our technique in an auto-complete list along with concepts, code elements, and section headers. We conducted a field study in which six professional developers used TaskNavigator for two weeks as part of their ongoing work. We found search results identified through extracted tasks to be more helpful to developers than those found through concepts, code elements, and section headers. The results indicate that task descriptions can be effectively extracted from software documentation, and that they help bridge the gap between documentation structure and the information needs of software developers.Christoph Treude, Martin P. Robillard, and Barthélémy Dagenai

Crossref

Adelaide Research & Scholarship

Institutional Knowledge at Singapore Management University

Génération de modèles et extraction de qualités pour le développement logiciel Agile

Author: Georis François
Publication venue
Publication date: 14/06/2019
Field of study

Repository of the University of Namur

Extracting Development Tasks to Navigate Software Documentation

Author: Barthelemy Dagenais
Christoph Treude
Martin P. Robillard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref