22 research outputs found
Recommended from our members
Techniques for Efficient and Effective Mobile Testing
The booming mobile app market attracts a large number of developers. As a result, the competition is extremely tough. This fierce competition leads to high standards required for mobile apps, which mandates efficient and effective testing. Efficient testing requires little effort to use, while effective testing checks that the app under test behaves as expected. Manual testing is highly effective, but it is costly. Automatic testing should come to the rescue, but current automatic methods are either ineffective or inefficient. Methods using implicit specifications – for instance, “an app should not crash” for catching fail-stop errors – are ineffective because they cannot find semantic problems. Methods using explicit specifications such as test scripts are inefficient because they require huge developer effort to create and maintain specifications. In this thesis, we present our two approaches for solving these challenges. We first built the AppDoctor system which efficiently tests mobile apps. It quickly explores an app then slowly but accurately verifies the potential problems to identify bugs without introducing false positives. It uses dependencies discovered between actions to simplify its reports. Our second approach, implemented in the AppFlow system, leverages the ample opportunity of reusing test cases between apps to gain efficiency without losing effectiveness. It allows common UI elements to be used in test scripts then recognizes these UI elements in real apps using a machine learning approach. The system also allows tests to be specified in reusable pieces, and provides a system to synthesize complete test cases from these reusable pieces. It enables robust tests to be created and reused across apps in the same category. The combination of these two approaches enables a developer to quickly test an app on a great number of combinations of actions for fail-stop problems, and effortlessly and efficiently test the app on most common scenarios for semantic problems. This combination covers most of her test requirements and greatly reduces her burden in testing the app
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Mobile developers face unique challenges when detecting and reporting crashes
in apps due to their prevailing GUI event-driven nature and additional sources
of inputs (e.g., sensor readings). To support developers in these tasks, we
introduce a novel, automated approach called CRASHSCOPE. This tool explores a
given Android app using systematic input generation, according to several
strategies informed by static and dynamic analyses, with the intrinsic goal of
triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented
crash report containing screenshots, detailed crash reproduction steps, the
captured exception stack trace, and a fully replayable script that
automatically reproduces the crash on a target device(s). We evaluated
CRASHSCOPE's effectiveness in discovering crashes as compared to five
state-of-the-art Android input generation tools on 61 applications. The results
demonstrate that CRASHSCOPE performs about as well as current tools for
detecting crashes and provides more detailed fault information. Additionally,
in a study analyzing eight real-world Android app crashes, we found that
CRASHSCOPE's reports are easily readable and allow for reliable reproduction of
crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on
Software Testing, Verification and Validation (ICST'16), Chicago, IL, April
10-15, 2016, pp. 33-4
Continuous, Evolutionary and Large-Scale: A New Perspective for Automated Mobile App Testing
Mobile app development involves a unique set of challenges including device
fragmentation and rapidly evolving platforms, making testing a difficult task.
The design space for a comprehensive mobile testing strategy includes features,
inputs, potential contextual app states, and large combinations of devices and
underlying platforms. Therefore, automated testing is an essential activity of
the development process. However, current state of the art of automated testing
tools for mobile apps poses limitations that has driven a preference for manual
testing in practice. As of today, there is no comprehensive automated solution
for mobile testing that overcomes fundamental issues such as automated oracles,
history awareness in test cases, or automated evolution of test cases.
In this perspective paper we survey the current state of the art in terms of
the frameworks, tools, and services available to developers to aid in mobile
testing, highlighting present shortcomings. Next, we provide commentary on
current key challenges that restrict the possibility of a comprehensive,
effective, and practical automated testing solution. Finally, we offer our
vision of a comprehensive mobile app testing framework, complete with research
agenda, that is succinctly summarized along three principles: Continuous,
Evolutionary and Large-scale (CEL).Comment: 12 pages, accepted to the Proceedings of 33rd IEEE International
Conference on Software Maintenance and Evolution (ICSME'17
Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference
Due to the importance of Android app quality assurance, many automated GUI
testing tools have been developed. Although the test algorithms have been
improved, the impact of GUI rendering has been overlooked. On the one hand,
setting a long waiting time to execute events on fully rendered GUIs slows down
the testing process. On the other hand, setting a short waiting time will cause
the events to execute on partially rendered GUIs, which negatively affects the
testing effectiveness. An optimal waiting time should strike a balance between
effectiveness and efficiency. We propose AdaT, a lightweight image-based
approach to dynamically adjust the inter-event time based on GUI rendering
state. Given the real-time streaming on the GUI, AdaT presents a deep learning
model to infer the rendering state, and synchronizes with the testing tool to
schedule the next event when the GUI is fully rendered. The evaluations
demonstrate the accuracy, efficiency, and effectiveness of our approach. We
also integrate our approach with the existing automated testing tool to
demonstrate the usefulness of AdaT in covering more activities and executing
more events on fully rendered GUIs.Comment: Proceedings of the 45th International Conference on Software
Engineerin
Towards Efficient Record and Replay: A Case Study in WeChat
WeChat, a widely-used messenger app boasting over 1 billion monthly active
users, requires effective app quality assurance for its complex features.
Record-and-replay tools are crucial in achieving this goal. Despite the
extensive development of these tools, the impact of waiting time between replay
events has been largely overlooked. On one hand, a long waiting time for
executing replay events on fully-rendered GUIs slows down the process. On the
other hand, a short waiting time can lead to events executing on
partially-rendered GUIs, negatively affecting replay effectiveness. An optimal
waiting time should strike a balance between effectiveness and efficiency. We
introduce WeReplay, a lightweight image-based approach that dynamically adjusts
inter-event time based on the GUI rendering state. Given the real-time
streaming on the GUI, WeReplay employs a deep learning model to infer the
rendering state and synchronize with the replaying tool, scheduling the next
event when the GUI is fully rendered. Our evaluation shows that our model
achieves 92.1% precision and 93.3% recall in discerning GUI rendering states in
the WeChat app. Through assessing the performance in replaying 23 common WeChat
usage scenarios, WeReplay successfully replays all scenarios on the same and
different devices more efficiently than the state-of-the-practice baselines
Large-Scale Analysis of Framework-Specific Exceptions in Android Apps
Mobile apps have become ubiquitous. For app developers, it is a key priority
to ensure their apps' correctness and reliability. However, many apps still
suffer from occasional to frequent crashes, weakening their competitive edge.
Large-scale, deep analyses of the characteristics of real-world app crashes can
provide useful insights to guide developers, or help improve testing and
analysis tools. However, such studies do not exist -- this paper fills this
gap. Over a four-month long effort, we have collected 16,245 unique exception
traces from 2,486 open-source Android apps, and observed that
framework-specific exceptions account for the majority of these crashes. We
then extensively investigated the 8,243 framework-specific exceptions (which
took six person-months): (1) identifying their characteristics (e.g.,
manifestation locations, common fault categories), (2) evaluating their
manifestation via state-of-the-art bug detection techniques, and (3) reviewing
their fixes. Besides the insights they provide, these findings motivate and
enable follow-up research on mobile apps, such as bug detection, fault
localization and patch generation. In addition, to demonstrate the utility of
our findings, we have optimized Stoat, a dynamic testing tool, and implemented
ExLocator, an exception localization tool, for Android apps. Stoat is able to
quickly uncover three previously-unknown, confirmed/fixed crashes in Gmail and
Google+; ExLocator is capable of precisely locating the root causes of
identified exceptions in real-world apps. Our substantial dataset is made
publicly available to share with and benefit the community.Comment: ICSE'18: the 40th International Conference on Software Engineerin
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
Translating Video Recordings of Mobile App Usages into Replayable Scenarios
Screen recordings of mobile applications are easy to obtain and capture a
wealth of information pertinent to software developers (e.g., bugs or feature
requests), making them a popular mechanism for crowdsourced app feedback. Thus,
these videos are becoming a common artifact that developers must manage. In
light of unique mobile development constraints, including swift release cycles
and rapidly evolving platforms, automated techniques for analyzing all types of
rich software artifacts provide benefit to mobile developers. Unfortunately,
automatically analyzing screen recordings presents serious challenges, due to
their graphical nature, compared to other types of (textual) artifacts. To
address these challenges, this paper introduces V2S, a lightweight, automated
approach for translating video recordings of Android app usages into replayable
scenarios. V2S is based primarily on computer vision techniques and adapts
recent solutions for object detection and image classification to detect and
classify user actions captured in a video, and convert these into a replayable
test scenario. We performed an extensive evaluation of V2S involving 175 videos
depicting 3,534 GUI-based actions collected from users exercising features and
reproducing bugs from over 80 popular Android apps. Our results illustrate that
V2S can accurately replay scenarios from screen recordings, and is capable of
reproducing 89% of our collected videos with minimal overhead. A case
study with three industrial partners illustrates the potential usefulness of
V2S from the viewpoint of developers.Comment: In proceedings of the 42nd International Conference on Software
Engineering (ICSE'20), 13 page
App Review Driven Collaborative Bug Finding
Software development teams generally welcome any effort to expose bugs in
their code base. In this work, we build on the hypothesis that mobile apps from
the same category (e.g., two web browser apps) may be affected by similar bugs
in their evolution process. It is therefore possible to transfer the experience
of one historical app to quickly find bugs in its new counterparts. This has
been referred to as collaborative bug finding in the literature. Our novelty is
that we guide the bug finding process by considering that existing bugs have
been hinted within app reviews. Concretely, we design the BugRMSys approach to
recommend bug reports for a target app by matching historical bug reports from
apps in the same category with user app reviews of the target app. We
experimentally show that this approach enables us to quickly expose and report
dozens of bugs for targeted apps such as Brave (web browser app). BugRMSys's
implementation relies on DistilBERT to produce natural language text
embeddings. Our pipeline considers similarities between bug reports and app
reviews to identify relevant bugs. We then focus on the app review as well as
potential reproduction steps in the historical bug report (from a same-category
app) to reproduce the bugs.
Overall, after applying BugRMSys to six popular apps, we were able to
identify, reproduce and report 20 new bugs: among these, 9 reports have been
already triaged, 6 were confirmed, and 4 have been fixed by official
development teams, respectively