15 research outputs found

    Structuring Documentation to Support State Search: A Laboratory Experiment about Protocol Programming

    Get PDF
    Abstract. Application Programming Interfaces (APIs) often define object protocols. Objects with protocols have a finite number of states and in each state a different set of method calls is valid. Many researchers have developed protocol verification tools because protocols are notoriously difficult to follow correctly. However, recent research suggests that a major challenge for API protocol programmers is effectively searching the state space. Verification is an ineffective guide for this kind of search. In this paper we instead propose Plaiddoc, which is like Javadoc except it organizes methods by state instead of by class and it includes explicit state transitions, state-based type specifications, and rich state relationships. We compare Plaiddoc to a Javadoc control in a betweensubjects laboratory experiment. We find that Plaiddoc participants complete state search tasks in significantly less time and with significantly fewer errors than Javadoc participants

    Statistical Machine Translation of English Text to API Code Usages: A comparison of Word Map, Contextual Graph Ordering, Phrase-based, and Neural Network Translations

    Get PDF
    Statistical Machine Translation (SMT) has gained enormous popularity in recent years as natural language translations have become increasingly accurate. In this thesis we apply SMT techniques in the context of translating English descriptions of programming tasks to source code. We evaluate four existing approaches: maximum likelihood word maps, ContextualExpansion, phrase-based, and neural network translation. As a training and test (i.e. reference translation) data set we clean and align the popular developer discussion forum StackOverflow. Our baseline approach, WordMapK, uses a simple maximum likelihood word map model which is then ordered using existing code usage graphs. The approach is quite effective, with a precision and recall of 20 and 50, respectively. Adding context to the word map model, ContextualExpansion, is able to increase the precision to 25 with a recall of 40. The traditional phrase-based translation model, Moses, achieves a similar precision and recall also incorporating the context of the input text by mapping English sequences to code sequences. The final approach is neural network translation, OpenNMT. While the median precision is 100 the recall is only 20. When manually examining the output of the neural translation, the code usages are very small and obvious. Our results represent an application of existing natural language strategies in the context of software engineering. We make our scripts, corpus, and reference translations in the hope that future work will adapt these techniques to further increase the quality of English to code statistical machine translation

    A Survey of Machine Learning for Big Code and Naturalness

    Get PDF
    Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.Comment: Website accompanying this survey paper can be found at https://ml4code.github.i
    corecore