142 research outputs found

    Understanding Eye Gaze Patterns in Code Comprehension

    Get PDF
    Program comprehension is a sub-field of software engineering that seeks to understand how developers understand programs. Comprehension acts as a starting point for many software engineering tasks such as bug fixing, refactoring, and feature creation. The dissertation presents a series of empirical studies to understand how developers comprehend software in realistic settings. The unique aspect of this work is the use of eye tracking equipment to gather fine-grained detailed information of what developers look at in software artifacts while they perform realistic tasks in an environment familiar to them, namely a context including both the Integrated Development Environment (Eclipse or Visual Studio) and a web browser (Google Chrome). The iTrace eye tracking infrastructure is used for certain eye tracking studies on large code files as it is able to handle page scrolling and context switching. The first study is a classroom-based study on how students actively trained in the classroom understand grouped units of C++ code. Results indicate students made many transitions between lines that were closer together, and were attracted the most to if statements and to a lesser extent assignment code. The second study seeks to understand how developers use Stack Overflow page elements to build summaries of open source project code. Results indicate participants focused more heavily on question and answer text, and the embedded code, more than they did the title, question tags, or votes. The third study presents a larger code summarization study using different information contexts: Stack Overflow, bug repositories and source code. Results show participants tended to visit up to two codebase files in either the combined or isolated codebase session, but visit more bug report pages, and spend longer time on new Stack Overflow pages they visited, when given either these two treatments in isolation. In the combined session, time spent on the one or two codebase files they viewed dominated the session time. Information learned from tracking developers\u27 gaze in these studies can form foundations for developer behavior models, which we hope can later inform recommendations for actions one might take to achieve workflow goals in these settings. Advisor: Bonita Shari

    Automatically Generating Documentation for Lambda Expressions in Java

    Full text link
    When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.Comment: to appear as full paper at MSR 2019, the 16th International Conference on Mining Software Repositorie

    Proceedings of the Second Program Visualization Workshop, 2002

    Get PDF
    The Program Visualization Workshops aim to bring together researchers who design and construct program visualizations and, above all, educators who use and evaluate visualizations in their teaching. The first workshop took place in July 2000 at Porvoo, Finland. The second workshop was held in cooperation with ACM SIGCSE and took place at HornstrupCentret, Denmark in June 2002, immediately following the ITiCSE 2002 Conference in Aarhus, Denmark

    POC on Credit Card “e-Statement” Details Generation for ANZ Bank

    Get PDF
    The storage and processing of data are major issues in information technology today. Every organization has been rapidly growing data day by day, and it becomes tough for the information systems to process and respond to the various queries required of them. Banking is one such industry which needs to handle millions of data records each time. Utilizing Hadoop as a solution is one way to handle these records more effectively and in less time. From this Proof of Concept (POC), the time difference between executing queries will take much less compared to the existing database system. The growth of data challenges cutting-edge companies like Google, Yahoo, Amazon, Microsoft and many more like them. They need to go through the terabytes and even petabytes of data to figure out issues regarding these websites which are popular among people. The tools they had at the time were not equipped to cope with this issue. Then Google presented MapReduce, a system they had used to cope with this issue. The majority of companies were facing the same issue as Google, so they did not want to develop another system like Google developed, and this system was suitable for all of them. After some time, this system became open source for all of them, and many companies appreciated this effort. That system was named as Hadoop, and today it is major part of the computing world. Due to its efficiency, many more companies are going to rely on Hadoop, and they are going to establish this system in their companies. Hadoop is used for running huge distributed programs so its simplicity and accessibility give it an edge over writing and running distributed programs. Any good programmer can create his own Hadoop instance in minutes, and it is also very cheap to create. Hadoop is moreover, very scalable and robust. Due to Hadoop’s features, it is getting very popular in the academic and industrial world. MapReduce is a model of data processing and in this model, data can easily be scalable over multiple systems. In this model, two terms are used for data processing, and those are mappers and reducers. Sometimes it is nontrivial to decompose the data application into mappers and into reducers. However, once you write an application in the MapReduce format then scaling of that application to run over many hundreds of systems is not a big issue. Some minor changes may still be required to take place, however due to its efficiency and scalability, programmers are attracted towards MapReduce like a bear towards honey. According to experts, this era is an era of development of unbelievable things, and these developments require large systems with larger data storage in them to cope with the immense storage issues. Hadoop plays an effective role to cope with this issue with its scalability and many more striking features. Hadoop is also an astonishing development. There is a challenge that must be fulfilled and that is how the existing data will move to the Hadoop infrastructure, when the existing data infrastructure is based on traditional relational database and Structured Query Language (SQL). Meanwhile there is the concept of Hive. Hive provides a dialect of SQL named as Hive Query Language to fulfil the query of data storage in the cluster of Hadoop instances. Hive does not work as a database, instead it is bound to the limitations imposed by the constraints of Hadoop. The most surprising limitation is that it cannot provide record level updates, such as insert and delete. You can only make new tables, or you can perform queries to output results to files. Hive also does not provide transactional data

    A Survey of Learning-based Automated Program Repair

    Full text link
    Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}

    Source Code Interaction on Touchscreens

    Get PDF
    Direct interaction with touchscreens has become a primary way of using a device. This work seeks to devise interaction methods for editing textual source code on touch-enabled devices. With the advent of the “Post-PC Era”, touch-centric interaction has received considerable attention in both research and development. However, various limitations have impeded widespread adoption of programming environments on modern platforms. Previous attempts have mainly been successful by simplifying or constraining conventional programming but have only insufficiently supported source code written in mainstream programming languages. This work includes the design, development, and evaluation of techniques for editing, selecting, and creating source code on touchscreens. The results contribute to text editing and entry methods by taking the syntax and structure of programming languages into account while exploiting the advantages of gesture-driven control. Furthermore, this work presents the design and software architecture of a mobile development environment incorporating touch-enabled modules for typical software development tasks

    Programming language trends : an empirical study

    Get PDF
    Predicting the evolution of software engineering technology trends is a dubious proposition. The recent evolution of software technology is a prime example; it is fast paced and affected by many factors, which are themselves driven by a wide range of sources. This dissertation is part of a long term project intended to analyze software engineering technology trends and how they evolve. Basically, the following questions will be answered: How to watch, predict, adapt to, and affect software engineering trends? In this dissertation, one field of software engineering, programming languages, will be discussed. After reviewing the history of a group of programming languages, it shows that two kinds of factors, intrinsic factors and extrinsic factors, could affect the evolution of a programming language. Intrinsic factors are the factors that can be used to describe the general desigu criteria of programming languages. Extrinsic factors are the factors that are not directly related to the general attributes of programming languages, but still can affect their evolution. In order to describe the relationship of these factors and how they affect programming language trends, these factors need to be quantified. A score has been assigued to each factor for every programming language. By collecting historical data, a data warehouse has been established, which stores the value of each factor for every programming language. The programming language trends are described and evaluated by using these data. Empirical research attempts to capture observed behaviors by empirical laws. In this dissertation, statistical methods are used to describe historical programming language trends and predict the evolution of the future trends. Several statistics models are constructed to describe the relationships among these factors. Canonical correlation is used to do the factor analysis. Multivariate multiple regression method has been used to construct the statistics models for programming language trends. After statistics models are constructed to describe the historical programming language trends, they are extended to do tentative prediction for future trends. The models are validated by comparing the predictive data and the actual data

    The Essence of Software Engineering

    Get PDF
    Software Engineering; Software Development; Software Processes; Software Architectures; Software Managemen
    • …
    corecore