45 research outputs found

    Natural Language is a Programming Language: Applying Natural Language Processing to Software Development

    Get PDF
    A powerful, but limited, way to view software is as source code alone. Treating a program as a sequence of instructions enables it to be formalized and makes it amenable to mathematical techniques such as abstract interpretation and model checking. A program consists of much more than a sequence of instructions. Developers make use of test cases, documentation, variable names, program structure, the version control repository, and more. I argue that it is time to take the blinders off of software analysis tools: tools should use all these artifacts to deduce more powerful and useful information about the program. Researchers are beginning to make progress towards this vision. This paper gives, as examples, four results that find bugs and generate code by applying natural language processing techniques to software artifacts. The four techniques use as input error messages, variable names, procedure documentation, and user questions. They use four different NLP techniques: document similarity, word semantics, parse trees, and neural networks. The initial results suggest that this is a promising avenue for future work

    WikiDo

    Get PDF
    Not formally publishedThe Internet has allowed collaboration on an unprecedented scale. Wikipedia, Luis Von Ahn’s ESP game, and reCAPTCHA have proven that tasks typically performed by expensive in-house or outsourced teams can instead be delegated to the mass of Internet computer users. These success stories show the opportunity for crowdsourcing other tasks, such as allowing computer users to help each other answer questions like “How do I make my computer do X?”. Such a system would reduce IT cost, user frustration, and machine downtime. The current approach to crowd-sourcing IT tasks, however, only allows users to collaborate on generating text. Anyone who goes through the process of searching help wikis and user forums hoping to find a solution for some computer problem knows the inefficacy and the frustration accompanying such a process. Text is ambiguous and often incomplete, particularly when written by non-experts. This paper presents WikiDo, a system that enables the mass of non-expert users to help each other answer how-to computer questions by actually performing the task rather than documenting its solution.National Science Foundation (U.S.) (grant IIS-0835652

    A Novel Method to Prevent Misconfigurations of Industrial Automation and Control Systems

    Get PDF
    Configuration errors are among the dominant causes of system faults for the industrial automation and control systems (IACS). It is difficult to detect and correct such errors of IACS as there are various kinds of systems and devices with miscellaneous configuration specifications. In this paper, we first propose a streaming algorithm to keep all the configuration changes in the limited memory space. And, when making a new configuration change, another novel streaming algorithm is proposed to search and return all the similar historical changes which can be used to validate this new one. So far, we are the first to model the configuration changes of IACS as a data stream and apply the streaming similarity search in correcting configuration errors while overcoming the inherent unbounded-memory bottleneck. The theoretical correctness and complexity analyses are presented. Experiments with real and synthetic datasets confirm the theoretical analyses and demonstrate the effectiveness of the proposed method in preventing misconfigurations of IACS

    Large Language Models Based Automatic Synthesis of Software Specifications

    Full text link
    Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences
    corecore