12 research outputs found
Event trace reduction for effective bug replay of Android apps via differential GUI state analysis
Š 2019 ACM. Existing Android testing tools, such as Monkey, generate a large quantity and a wide variety of user events to expose latent GUI bugs in Android apps. However, even if a bug is found, a majority of the events thus generated are often redundant and bug-irrelevant. In addition, it is also time-consuming for developers to localize and replay the bug given a long and tedious event sequence (trace). This paper presents ECHO, an event trace reduction tool for effective bug replay by using a new differential GUI state analysis. Given a sequence of events (trace), ECHO aims at removing bug-irrelevant events by exploiting the differential behavior between the GUI states collected when their corresponding events are triggered. During dynamic testing, ECHO injects at most one lightweight inspection event after every event to collect its corresponding GUI state. A new adaptive model is proposed to selectively inject inspection events based on sliding windows to differentiate the GUI states on-the-fly in a single testing process. The experimental results show that ECHO improves the effectiveness of bug replay by removing 85.11% redundant events on average while also revealing the same bugs as those detected when full event sequences are used
Automated, Cost-effective, and Update-driven App Testing
Apps' pervasive role in our society led to the definition of test automation
approaches to ensure their dependability. However, state-of-the-art approaches
tend to generate large numbers of test inputs and are unlikely to achieve more
than 50% method coverage. In this paper, we propose a strategy to achieve
significantly higher coverage of the code affected by updates with a much
smaller number of test inputs, thus alleviating the test oracle problem. More
specifically, we present ATUA, a model-based approach that synthesizes App
models with static analysis, integrates a dynamically-refined state abstraction
function and combines complementary testing strategies, including (1) coverage
of the model structure, (2) coverage of the App code, (3) random exploration,
and (4) coverage of dependencies identified through information retrieval. Its
model-based strategy enables ATUA to generate a small set of inputs that
exercise only the code affected by the updates. In turn, this makes common test
oracle solutions more cost-effective as they tend to involve human effort. A
large empirical evaluation, conducted with 72 App versions belonging to nine
popular Android Apps, has shown that ATUA is more effective and less effort
intensive than state-of-the-art approaches when testing App updates
âAnd all the pieces matter...â Hybrid Testing Methods for Android App's Privacy Analysis
Smartphones have become inherent to the every day life of billions of people worldwide, and they
are used to perform activities such as gaming, interacting with our peers or working. While extremely
useful, smartphone apps also have drawbacks, as they can affect the security and privacy of users.
Android devices hold a lot of personal data from users, including their social circles (e.g., contacts),
usage patterns (e.g., app usage and visited websites) and their physical location. Like in most software
products, Android apps often include third-party code (Software Development Kits or SDKs) to
include functionality in the app without the need to develop it in-house. Android apps and third-party
components embedded in them are often interested in accessing such data, as the online ecosystem
is dominated by data-driven business models and revenue streams like advertising.
The research community has developed many methods and techniques for analyzing the privacy
and security risks of mobile apps, mostly relying on two techniques: static code analysis and dynamic
runtime analysis. Static analysis analyzes the code and other resources of an app to detect potential
app behaviors. While this makes static analysis easier to scale, it has other drawbacks such as
missing app behaviors when developers obfuscate the appâs code to avoid scrutiny. Furthermore,
since static analysis only shows potential app behavior, this needs to be confirmed as it can also
report false positives due to dead or legacy code. Dynamic analysis analyzes the apps at runtime to
provide actual evidence of their behavior. However, these techniques are harder to scale as they need
to be run on an instrumented device to collect runtime data. Similarly, there is a need to stimulate
the app, simulating real inputs to examine as many code-paths as possible. While there are some
automatic techniques to generate synthetic inputs, they have been shown to be insufficient.
In this thesis, we explore the benefits of combining static and dynamic analysis techniques to
complement each other and reduce their limitations. While most previous work has often relied on
using these techniques in isolation, we combine their strengths in different and novel ways that allow
us to further study different privacy issues on the Android ecosystem. Namely, we demonstrate the
potential of combining these complementary methods to study three inter-related issues:
⢠A regulatory analysis of parental control apps. We use a novel methodology that relies on
easy-to-scale static analysis techniques to pin-point potential privacy issues and violations of
current legislation by Android apps and their embedded SDKs. We rely on the results from our
static analysis to inform the way in which we manually exercise the apps, maximizing our ability
to obtain real evidence of these misbehaviors. We study 46 publicly available apps and find
instances of data collection and sharing without consent and insecure network transmissions
containing personal data. We also see that these apps fail to properly disclose these practices
in their privacy policy.
⢠A security analysis of the unauthorized access to permission-protected data without user consent.
We use a novel technique that combines the strengths of static and dynamic analysis, by
first comparing the data sent by applications at runtime with the permissions granted to each
app in order to find instances of potential unauthorized access to permission protected data.
Once we have discovered the apps that are accessing personal data without permission, we
statically analyze their code in order to discover covert- and side-channels used by apps and SDKs to circumvent the permission system. This methodology allows us to discover apps using
the MAC address as a surrogate for location data, two SDKs using the external storage as a
covert-channel to share unique identifiers and an app using picture metadata to gain unauthorized
access to location data.
⢠A novel SDK detection methodology that relies on obtaining signals observed both in the appâs
code and static resources and during its runtime behavior. Then, we rely on a tree structure
together with a confidence based system to accurately detect SDK presence without the need
of any a priory knowledge and with the ability to discern whether a given SDK is part of legacy
or dead code. We prove that this novel methodology can discover third-party SDKs with more
accuracy than state-of-the-art tools both on a set of purpose-built ground-truth apps and on a
dataset of 5k publicly available apps.
With these three case studies, we are able to highlight the benefits of combining static and dynamic
analysis techniques for the study of the privacy and security guarantees and risks of Android
apps and third-party SDKs. The use of these techniques in isolation would not have allowed us to
deeply investigate these privacy issues, as we would lack the ability to provide real evidence of potential
breaches of legislation, to pin-point the specific way in which apps are leveraging cover and side
channels to break Androidâs permission system or we would be unable to adapt to an ever-changing
ecosystem of Android third-party companies.The works presented in this thesis were partially funded within the framework of the following projects
and grants:
⢠European Unionâs Horizon 2020 Innovation Action program (Grant Agreement No. 786741,
SMOOTH Project and Grant Agreement No. 101021377, TRUST AWARE Project).
⢠Spanish Government ODIO NºPID2019-111429RB-C21/PID2019-111429RBC22.
⢠The Spanish Data Protection Agency (AEPD)
⢠AppCensus Inc.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en IngenierĂa TelemĂĄtica por la Universidad Carlos III de MadridPresidente: Srdjan Matic.- Secretario: Guillermo SuĂĄrez-Tangil.- Vocal: Ben Stoc
The construction and applications of callback control flow graphs for event-driven and framework-based mobile apps
Mobile devices have become ubiquitous over the last years. Android, as the leading platform in the mobile ecosystem, have over 2.5 million apps published in Google Play Market. This enormous ecosystem creates a fierce competition between apps with similar functionality in which the low quality of apps has been shown to increase the churn rate considerably. Additionally, the complex event-driven, framework-based architecture that developers use to implement apps imposes several challenges and led to new varieties of code smells and bugs. There is a need for tools that assure the quality of apps such as program analysis and testing tools. One of the foundational challenges for developing these tools is the sequencing or ordering of callback methods invoked from external events (e.g. GUI events) and framework calls. Even for a small subset of callbacks, it has been shown that the current state-of-the-art tools fail to generate sequences of callbacks that match the runtime behavior of Android apps.
This thesis explores the construction and applications of new representations and program analyses for event-driven, framework-based mobile applications, specifically Android apps. In Android, we observe that the changes of control flow between entry points are mostly handled by the framework using callbacks. These callbacks can be executed synchronously and asynchronously when an external event happens (e.g. a click event) or a framework call is made. In framework-based systems, method calls to the framework can invoke sequences of callbacks. With the high overhead introduced by libraries such as the Android framework, most current tools for the analysis of Android apps have opted to skip the analysis of these libraries. Thus, these analyses missed the correct order of callbacks for each callback invoked in framework calls. This thesis presents a new specification called Predicate Callback Summary (PCS) to summarize how library or API methods invoke callbacks. PCSs enable inter-procedural analysis for Android apps without the overhead of analyzing the whole framework and help developers understand how their code (callback methods) is executed in the framework. We show that our static analysis techniques to summarize PCSs is accurate and scalable, considering the complexity of the millions of lines of code in the Android framework.
With PCSs summaries, we have information about the control flow of callbacks invoked in framework calls but lack information about how external events can execute callbacks. To integrate event-driven control flow behavior with control behavior generated from framework calls, we designed a novel program representation, namely Callback Control Flow Automata (CCFA). The design of CCFA is based on the Extended Finite State Machine (EFSM) model, which extends the Finite State Machine (FSM) by labeling transitions using information such as guards. In a CCFA, a state represents whether the execution path enters or exits a callback. The transition from one state to another represents the transfer of control flow between callbacks. We present an analysis to automatically construct CCFAs by combining two callback control flow representations developed from the previous research, namely, Window Transition Graphs (WTGs) and PCSs. To demonstrate the usefulness of our representation, we integrated CCFAs into two client analyses: a taint analysis using FLOWDROID, and a value-flow analysis that computes source and sink pairs of a program. Our evaluation shows that we can compute CCFAs efficiently and that CCFAs improved the callback coverages over WTGs. As a result of using CCFAs, we obtained 33 more true positive security leaks than FLOWDROID over a total of 55 apps we have run. With a low false positive rate, we found that 22.76\% of source-sink pairs we computed are located in different callbacks and that 31 out of 55 apps contain source-sink pairs spreading across components.
In the last part of this thesis, we use the CCFAs to develop a new family of coverage criteria based on callback sequences for more effective testing Android apps. We present 2 studies to help us identify what types of callbacks are important when detecting bugs. With the help of the empirical results, we defined 3 coverage criteria based on callback sequences. Our evaluation shows that our coverage criteria are a more effective metric than statement and GUI-based event coverage to guide test input generation