102 research outputs found

    Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development

    Full text link
    Mobile devices and platforms have become an established target for modern software developers due to performant hardware and a large and growing user base numbering in the billions. Despite their popularity, the software development process for mobile apps comes with a set of unique, domain-specific challenges rooted in program comprehension. Many of these challenges stem from developer difficulties in reasoning about different representations of a program, a phenomenon we define as a "language dichotomy". In this paper, we reflect upon the various language dichotomies that contribute to open problems in program comprehension and development for mobile apps. Furthermore, to help guide the research community towards effective solutions for these problems, we provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference on Program Comprehension (ICPC'18

    A Data Set of Generalizable Python Code Change Patterns

    Full text link
    Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This review process allows us to identify and share 72 Python change patterns that can be used to build and advance Python developer support tools


    Get PDF
    Both expert and novice software developers frequently access software development resources available on the Web in order to lookup or learn new APIs, tools and techniques. Software quality is affected negatively when developers fail to find high-quality information relevant to their problem. While there is a substantial amount of freely available resources that can be accessed online, some of the available resources contain information that suffers from error proneness, copyright infringement, security concerns, and incompatible versions. Use of such toxic information can have a strong negative effect on developer’s efficacy. This dissertation focuses specifically on software tutorials, aiming to automatically evaluate the quality of such documents available on the Web. In order to achieve this goal, we present two contributions: 1) scalable detection of duplicated code snippets; 2) automatic identification of valid version ranges. Software tutorials consist of a combination of source code snippets and natural language text. The code snippets in a tutorial can originate from different sources, perhaps carrying stringent licensing requirements or known security vulnerabilities. Developers, typically unaware of this, can reuse these code snippets in their project. First, in this thesis, we present our work on a Web-scale code clone search technique that is able to detect duplicate code snippets between large scale document and source code corpora in order to trace toxic code snippets. As software libraries and APIs evolve over time, existing software development tutorials can become outdated. It is difficult for software developers and especially novices to determine the expected version of the software implicit in a specific tutorial in order to decide whether the tutorial is applicable to their software development environment. To overcome this challenge, in this thesis we present a novel technique for automatic identification of the valid version range of software development tutorials on the Web

    Barriers and Self-Efficacy: A Large-Scale Study on the Impact of OSS Courses on Student Perceptions

    Full text link
    Open source software (OSS) development offers a unique opportunity for students in Software Engineering to experience and participate in large-scale software development, however, the impact of such courses on students' self-efficacy and the challenges faced by students are not well understood. This paper aims to address this gap by analyzing data from multiple instances of OSS development courses at universities in different countries and reporting on how students' self-efficacy changed as a result of taking the course, as well as the barriers and challenges faced by students

    Assessing Word Similarity Metrics For Traceability Link Recovery

    Get PDF
    Der Softwareentwicklungsprozess involviert oft verschiedene Artefakte, welche jeweils verschiedene Aspekte eines Softwaresystems beschreiben. Traceability Link Recovery ist ein Verfahren, das diesen Entwicklungsprozess unterstützt, indem es verwandte Teile aus verschiedenen Artefakten verbindet. Artefakte, die in natürlicher Sprache ausgedrückt werden, sind schwierig für Maschinen zu verstehen und stellen damit eine besondere Herausforderung für die Traceability Link Recovery dar. Hierfür werden für gewöhnlich Wortähnlichkeitsmetriken eingesetzt, um unterschiedliche Wörter mit gleicher Bedeutung als Synonyme zu identifizieren. ArDoCo ist eine Software, die Wortähnlichkeitsmetriken zum Wiederherstellen von Trace Links zwischen textueller Softwarearchitekturdokumentation und formalen Architekturmodellen einsetzt. Diese Arbeit befasst sich mit dem Einfluss verschiedener Wortähnlichkeitsmetriken auf ArDoCo. Die Wortähnlichkeitsmetriken werden mit mehreren Fallstudien evaluiert. Dazu werden die Metriken Präzision und Sensitivität als auch besondere Herausforderungen der einzelnen Wortähnlichkeitsmetriken als Teil der Evaluation präsentiert

    Large Language Models for Software Engineering: Survey and Open Problems

    Full text link
    This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE
    • …