8 research outputs found

    Learning to Answer Semantic Queries over Code

    Full text link
    During software development, developers need answers to queries about semantic aspects of code. Even though extractive question-answering using neural approaches has been studied widely in natural languages, the problem of answering semantic queries over code using neural networks has not yet been explored. This is mainly because there is no existing dataset with extractive question and answer pairs over code involving complex concepts and long chains of reasoning. We bridge this gap by building a new, curated dataset called CodeQueries, and proposing a neural question-answering methodology over code. We build upon state-of-the-art pre-trained models of code to predict answer and supporting-fact spans. Given a query and code, only some of the code may be relevant to answer the query. We first experiment under an ideal setting where only the relevant code is given to the model and show that our models do well. We then experiment under three pragmatic considerations: (1) scaling to large-size code, (2) learning from a limited number of examples and (3) robustness to minor syntax errors in code. Our results show that while a neural model can be resilient to minor syntax errors in code, increasing size of code, presence of code that is not relevant to the query, and reduced number of training examples limit the model performance. We are releasing our data and models to facilitate future work on the proposed problem of answering semantic queries over code

    Effficient Graph-based Computation and Analytics

    Get PDF
    With data explosion in many domains, such as social media, big code repository, Internet of Things (IoT), and inertial sensors, only 32% of data available to academic and industry is put to work, and the remaining 68% goes unleveraged. Moreover, people are facing an increasing number of obstacles concerning complex analytics on the sheer size of data, which include 1) how to perform dynamic graph analytics in a parallel and robust manner within a reasonable time? 2) How to conduct performance optimizations on a property graph representing and consisting of the semantics of code, data, and runtime systems for big data applications? 3) How to innovate neural graph approaches (ie, Transformer) to solve realistic research problems, such as automated program repair and inertial navigation? To tackle these problems, I present two efforts along this road: efficient graph-based computation and intelligent graph analytics. Specifically, I firstly propose two theory-based dynamic graph models to characterize temporal trends in large social media networks, then implement and optimize them atop Apache Spark GraphX to improve their performances. In addition, I investigate a semantics-aware optimization framework consisting of offline static analysis and online dynamic analysis on a property graph representing the skeleton of a data-intensive application, to interactively and semi-automatically assist programmers to scrutinize the performance problems camouflaged in the source code. In the design of intelligent graph-based algorithms, I innovate novel neural graph-based approaches with multi-task learning techniques to repair a broad range of programming bugs automatically, and also improve the accuracy of pedestrian navigation systems in only consideration of sensor data of Inertial Measurement Units (IMU, ie accelerometer, gyroscope, and magnetometer). In this dissertation, I elaborate on the definitions of these research problems and leverage the knowledge of graph computation, program analysis, and deep learning techniques to seek solutions to them, followed by comprehensive comparisons with the state-of-the-art baselines and discussions on future research
    corecore