15 research outputs found
Structuring Documentation to Support State Search: A Laboratory Experiment about Protocol Programming
Abstract. Application Programming Interfaces (APIs) often define object protocols. Objects with protocols have a finite number of states and in each state a different set of method calls is valid. Many researchers have developed protocol verification tools because protocols are notoriously difficult to follow correctly. However, recent research suggests that a major challenge for API protocol programmers is effectively searching the state space. Verification is an ineffective guide for this kind of search. In this paper we instead propose Plaiddoc, which is like Javadoc except it organizes methods by state instead of by class and it includes explicit state transitions, state-based type specifications, and rich state relationships. We compare Plaiddoc to a Javadoc control in a betweensubjects laboratory experiment. We find that Plaiddoc participants complete state search tasks in significantly less time and with significantly fewer errors than Javadoc participants
Recommended from our members
jTutors: A Web-Based Tutoring System for Java APIs
For building robust software applications, it is important for the software engineer to make efficient use of the available building blocks. Learning the basic language constructs is only the first step in this process. It is becoming increasingly important for software engineers, especially students, to get acquainted with the available Application Programming Interfaces (API) and the ways to efficiently use them. With the ever increasing number of APIs, it becomes difficult for teachers to expose students to most of the APIs. Hence it becomes important to complement the traditional methods of instruction, such as lectures or API documentation, with techniques that would ease learning by software engineering students.
We have leveraged the freely available code examples on various Internet communities, blogs and software project hosting websites to create an intelligent tutoring system, jTutors, which would aid the teacher in exposing students to multitudes of the available Java APIs. jTutors uses these code samples to generates intelligent tutorials. We evaluated jTutors by performing realistic test cases and a heuristic assessment
Statistical Machine Translation of English Text to API Code Usages: A comparison of Word Map, Contextual Graph Ordering, Phrase-based, and Neural Network Translations
Statistical Machine Translation (SMT) has gained enormous popularity in recent years as natural
language translations have become increasingly accurate. In this thesis we apply SMT techniques in
the context of translating English descriptions of programming tasks to source code. We evaluate
four existing approaches: maximum likelihood word maps, ContextualExpansion, phrase-based, and
neural network translation. As a training and test (i.e. reference translation) data set we clean and
align the popular developer discussion forum StackOverflow.
Our baseline approach, WordMapK, uses a simple maximum likelihood word map model which
is then ordered using existing code usage graphs. The approach is quite effective, with a precision
and recall of 20 and 50, respectively. Adding context to the word map model, ContextualExpansion,
is able to increase the precision to 25 with a recall of 40. The traditional phrase-based translation
model, Moses, achieves a similar precision and recall also incorporating the context of the input text
by mapping English sequences to code sequences. The final approach is neural network translation,
OpenNMT. While the median precision is 100 the recall is only 20. When manually examining the
output of the neural translation, the code usages are very small and obvious. Our results represent
an application of existing natural language strategies in the context of software engineering. We
make our scripts, corpus, and reference translations in the hope that future work will adapt these
techniques to further increase the quality of English to code statistical machine translation
A Survey of Machine Learning for Big Code and Naturalness
Research at the intersection of machine learning, programming languages, and
software engineering has recently taken important steps in proposing learnable
probabilistic models of source code that exploit code's abundance of patterns.
In this article, we survey this work. We contrast programming languages against
natural languages and discuss how these similarities and differences drive the
design of probabilistic models. We present a taxonomy based on the underlying
design principles of each model and use it to navigate the literature. Then, we
review how researchers have adapted these models to application areas and
discuss cross-cutting and application-specific challenges and opportunities.Comment: Website accompanying this survey paper can be found at
https://ml4code.github.i