278 research outputs found
Sensemaking Practices in the Everyday Work of AI/ML Software Engineering
This paper considers sensemaking as it relates to everyday software engineering (SE) work practices and draws on a multi-year ethnographic study of SE projects at a large, global technology company building digital services infused with artificial intelligence (AI) and machine learning (ML) capabilities. Our findings highlight the breadth of sensemaking practices in AI/ML projects, noting developers' efforts to make sense of AI/ML environments (e.g., algorithms/methods and libraries), of AI/ML model ecosystems (e.g., pre-trained models and "upstream"models), and of business-AI relations (e.g., how the AI/ML service relates to the domain context and business problem at hand). This paper builds on recent scholarship drawing attention to the integral role of sensemaking in everyday SE practices by empirically investigating how and in what ways AI/ML projects present software teams with emergent sensemaking requirements and opportunities
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests. Are We There Yet?
Performance is one of the important aspects of software quality. In fact, performance issues exist widely in software systems, and the process of fixing the performance issues is an essential step in the release cycle of software systems. Although performance testing is widely adopted in practice, it is still expensive and time-consuming. In particular, the performance testing is usually conducted after the system is built in a dedicated testing environment. The challenge of performance testing makes it difficult to fit into the common DevOps process in software development. On the other hand, there exists a large number of tests readily available, that are executed regularly within the release pipeline during software development. In this paper, we perform an exploratory study to determine whether such readily available tests are capable of serving as performance tests. In particular, we would like to see whether the performance of these tests can demonstrate the performance improvements obtained from fixing real-life performance issues. We collect 127 performance issues from Hadoop and Cassandra and evaluate the performance of the readily available tests from the commits before and after the performance issue fixes. We find that most of the improvements from the fixes to performance issues can be demonstrated using the readily available tests in the release pipeline. However, only a very small portion of the tests can be used for demonstrating the improvements. By manually examining the tests, we identify eight reasons that a test cannot demonstrate performance improvement even though it covers the changed source code of the issue fix. Finally, we build random classifiers determining the important metrics influencing the readily available tests (not) being able to demonstrate performance improvements from issue fixes. We find that the test code itself and the source code covered by the test are important factors, while the factors related to the code changes in the performance issues fixes have low importance. Practitioners should focus on designing and improving the tests, instead of fine-tuning tests for different performance issues fixes. Our findings can be used as a guideline for practitioners to reduce the amount of effort spent on leveraging and designing tests that run in the release pipeline for performance assurance activities
Recognizing Developers' Emotions while Programming
Developers experience a wide range of emotions during programming tasks,
which may have an impact on job performance. In this paper, we present an
empirical study aimed at (i) investigating the link between emotion and
progress, (ii) understanding the triggers for developers' emotions and the
strategies to deal with negative ones, (iii) identifying the minimal set of
non-invasive biometric sensors for emotion recognition during programming task.
Results confirm previous findings about the relation between emotions and
perceived productivity. Furthermore, we show that developers' emotions can be
reliably recognized using only a wristband capturing the electrodermal activity
and heart-related metrics.Comment: Accepted for publication at ICSE2020 Technical Trac
Efficiently Manifesting Asynchronous Programming Errors in Android Apps
Android, the #1 mobile app framework, enforces the single-GUI-thread model,
in which a single UI thread manages GUI rendering and event dispatching. Due to
this model, it is vital to avoid blocking the UI thread for responsiveness. One
common practice is to offload long-running tasks into async threads. To achieve
this, Android provides various async programming constructs, and leaves
developers themselves to obey the rules implied by the model. However, as our
study reveals, more than 25% apps violate these rules and introduce
hard-to-detect, fail-stop errors, which we term as aysnc programming errors
(APEs). To this end, this paper introduces APEChecker, a technique to
automatically and efficiently manifest APEs. The key idea is to characterize
APEs as specific fault patterns, and synergistically combine static analysis
and dynamic UI exploration to detect and verify such errors. Among the 40
real-world Android apps, APEChecker unveils and processes 61 APEs, of which 51
are confirmed (83.6% hit rate). Specifically, APEChecker detects 3X more APEs
than the state-of-art testing tools (Monkey, Sapienz and Stoat), and reduces
testing time from half an hour to a few minutes. On a specific type of APEs,
APEChecker confirms 5X more errors than the data race detection tool,
EventRacer, with very few false alarms
- …