928 research outputs found

    Interactive Query Language for Code Comprehension

    Get PDF
    Code comprehension is a fundamental task for software development. Every bug fix, maintenance or new feature development requires the whole understanding of the affectedcode. There exist a number of code comprehension tools but most of them has a limitedfeature set and they are binded with a fixed (usually) graphical user interface. This putlimitations for their use. In this thesis we will define a flexible but safe query language to execute the most fundamental comprehension queries against a large code base. We will investigate how much this language could be language agnostic and how to support specificlanguage features. I will implement a prototype tool to prove the concept using the opensource CodeCompass code comprehension platform. In this prototype i mainly target C and C++ languages

    Multilingual investigation of theory-based intervention for program comprehension

    Get PDF
    This thesis is the continuation of an experiment called “Eye-movement Modeling Examples in Source Code Comprehension: A Classroom Study”. This first experiment studies how effective is showing novice programmers how experts read code with a video with the expert’s gaze guided by a verbal explanation. Therefore, this thesis studies, using a similar experiment, whether only verbal explanation and visual stimuli without the expert’s gaze could be also helpful for the programming novices.Grado en Ingeniería Informática de Servicios y Aplicacione

    Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

    Full text link
    In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction

    Understanding Eye Gaze Patterns in Code Comprehension

    Get PDF
    Program comprehension is a sub-field of software engineering that seeks to understand how developers understand programs. Comprehension acts as a starting point for many software engineering tasks such as bug fixing, refactoring, and feature creation. The dissertation presents a series of empirical studies to understand how developers comprehend software in realistic settings. The unique aspect of this work is the use of eye tracking equipment to gather fine-grained detailed information of what developers look at in software artifacts while they perform realistic tasks in an environment familiar to them, namely a context including both the Integrated Development Environment (Eclipse or Visual Studio) and a web browser (Google Chrome). The iTrace eye tracking infrastructure is used for certain eye tracking studies on large code files as it is able to handle page scrolling and context switching. The first study is a classroom-based study on how students actively trained in the classroom understand grouped units of C++ code. Results indicate students made many transitions between lines that were closer together, and were attracted the most to if statements and to a lesser extent assignment code. The second study seeks to understand how developers use Stack Overflow page elements to build summaries of open source project code. Results indicate participants focused more heavily on question and answer text, and the embedded code, more than they did the title, question tags, or votes. The third study presents a larger code summarization study using different information contexts: Stack Overflow, bug repositories and source code. Results show participants tended to visit up to two codebase files in either the combined or isolated codebase session, but visit more bug report pages, and spend longer time on new Stack Overflow pages they visited, when given either these two treatments in isolation. In the combined session, time spent on the one or two codebase files they viewed dominated the session time. Information learned from tracking developers\u27 gaze in these studies can form foundations for developer behavior models, which we hope can later inform recommendations for actions one might take to achieve workflow goals in these settings. Advisor: Bonita Shari

    Experience Report: Thinkathon -- Countering an "I Got It Working" Mentality with Pencil-and-Paper Exercises

    Get PDF
    Goal-directed problem-solving labs can lead a student to believe that the most important achievement in a first programming course is to get programs working. This is counter to research indicating that code comprehension is an important developmental step for novice programmers. We observed this in our own CS-0 introductory programming course, and furthermore, that students weren't making the connection between code comprehension in labs and a final examination that required solutions to pencil-and-paper comprehension and writing exercises, where sound understanding of programming concepts is essential. Realising these deficiencies late in our course, we put on three 3-hour optional revision evenings just days before the exam. Based on a mastery learning philosophy, students were expected to work through a bank of around 200 pencil-and-paper exercises. By comparison with a machine-based hackathon, we called this a Thinkathon. Students completed a pre and post questionnaire about their experience of the Thinkathon. While we find that Thinkathon attendance positively influences final grades, we believe our reflection on the overall experience is of greater value. We report that: respected methods for developing code comprehension may not be enough on their own; novices must exercise their developing skills away from machines; and there are social learning outcomes in programming courses, currently implicit, that we should make explicit

    An empirical study on code comprehension: DCI compared to OO

    Get PDF
    Comprehension of source code affects software development, especially its maintenance where reading code is the most time consuming performed activity. A programming paradigm imposes a style of arranging the source code that is aligned with a way of thinking toward a computable solution. Then, a programming paradigm with a programming language represents an important factor for source code comprehension. Object-Oriented (OO) is the dominant paradigm today. Although, it was criticized from its beginning and recently an alternative has been proposed. In an OO source code, system functions cannot escape outside the definition of classes and their descriptions live inside multiple class declarations. This results in an obfuscated code, a lost sense the run-time, and in a lack of global knowledge that weaken the understandability of the source code at system level. A new paradigm is emerging to address these and other OO issues, this is the Data Context Interaction (DCI) paradigm. We conducted the first human subject related controlled experiment to evaluate the effects of DCI on code comprehension compared to OO. We looked for correctness, time consumption, and focus of attention during comprehension tasks. We also present a novel approach using metrics from Social Network Analysis to analyze what we call the Cognitive Network of Language Elements (CNLE) that is built by programmers while comprehending a system. We consider this approach useful to understand source code properties uncovered from code reading cognitive tasks. The results obtained are preliminary in nature but indicate that DCI-trygve approach produces more comprehensible source code and promotes a stronger focus the attention in important files when programmers are reading code during program comprehension. Regarding reading time spent on files, we were not able to indicate with statistical significance which approach allows programmers to consume less time
    • …
    corecore