Search CORE

9 research outputs found

DeepSoft: A vision for a deep model of software

Author: Dam Hoa Khanh
Ghose Aditya
Grundy John
Tran Truyen
Publication venue
Publication date: 01/01/2016
Field of study

Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.Comment: FSE 201

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Research Online

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Author: Allamanis Miltiadis
Bielik Pavol
Donzeau-Gouge V.
Le Quoc
Merkel Dirk
Mikolov Tomas
Nguyen Anh Tuan
Piech Chris
Szumlanski Sean
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2018
Field of study

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions

arXiv.org e-Print Archive

Crossref

Exploring regularities in software with statistical models and their applications

Author: Nguyen Anh Tuan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Software systems are becoming popular. They are used with different platforms for different applications. Software systems are developed with support from programming languages, which help developers work conveniently. Programming languages can have different paradigms with different form, syntactic structures, keywords, and representation ways. In many cases, however, programming languages are similar in different important aspects: 1. They are used to support description of specific tasks, 2. Source codes are written in languages and includes a limit set of distinctive tokens, many tokens are repeated like keywords, function calls, and 3. They follow specific syntactic rules to make machine understanding. Those points also respect the similarity between programming language and natural language. Due to its critical role in many applications, natural language processing (NLP) has been studied much and given many promising results like automatic cross-language translation, speech-to-text, information searching, etc. It is interesting to observe if there are similar characteristics between natural language and programming language and whether techniques in NLP can be reused for programming language processing? Recent works in software engineering (SE) shows that their similarities between NLP and programming language processing and techniques in NLP can be reused for PLP. This dissertation introduces my works with contributions in study of characteristics of programming languages, the models which employed them and the main applications that show the usefulness of the proposed models. Study in both three aspects has drawn interests from software engineering community and received awards due to their innovation and applicability. \u27 I hope that this dissertation will bring a systematic view of how advantage techniques in natural language processing and machine learning can be re-used and give huge benefit for programming language processing, and how those techniques are adapted with characteristics of programming language and software systems

Digital Repository @ Iowa State University (ISU)