2,146 research outputs found
Keeping Context In Mind: Automating Mobile App Access Control with User Interface Inspection
Recent studies observe that app foreground is the most striking component
that influences the access control decisions in mobile platform, as users tend
to deny permission requests lacking visible evidence. However, none of the
existing permission models provides a systematic approach that can
automatically answer the question: Is the resource access indicated by app
foreground? In this work, we present the design, implementation, and evaluation
of COSMOS, a context-aware mediation system that bridges the semantic gap
between foreground interaction and background access, in order to protect
system integrity and user privacy. Specifically, COSMOS learns from a large set
of apps with similar functionalities and user interfaces to construct generic
models that detect the outliers at runtime. It can be further customized to
satisfy specific user privacy preference by continuously evolving with user
decisions. Experiments show that COSMOS achieves both high precision and high
recall in detecting malicious requests. We also demonstrate the effectiveness
of COSMOS in capturing specific user preferences using the decisions collected
from 24 users and illustrate that COSMOS can be easily deployed on smartphones
as a real-time guard with a very low performance overhead.Comment: Accepted for publication in IEEE INFOCOM'201
Translating Video Recordings of Mobile App Usages into Replayable Scenarios
Screen recordings of mobile applications are easy to obtain and capture a
wealth of information pertinent to software developers (e.g., bugs or feature
requests), making them a popular mechanism for crowdsourced app feedback. Thus,
these videos are becoming a common artifact that developers must manage. In
light of unique mobile development constraints, including swift release cycles
and rapidly evolving platforms, automated techniques for analyzing all types of
rich software artifacts provide benefit to mobile developers. Unfortunately,
automatically analyzing screen recordings presents serious challenges, due to
their graphical nature, compared to other types of (textual) artifacts. To
address these challenges, this paper introduces V2S, a lightweight, automated
approach for translating video recordings of Android app usages into replayable
scenarios. V2S is based primarily on computer vision techniques and adapts
recent solutions for object detection and image classification to detect and
classify user actions captured in a video, and convert these into a replayable
test scenario. We performed an extensive evaluation of V2S involving 175 videos
depicting 3,534 GUI-based actions collected from users exercising features and
reproducing bugs from over 80 popular Android apps. Our results illustrate that
V2S can accurately replay scenarios from screen recordings, and is capable of
reproducing 89% of our collected videos with minimal overhead. A case
study with three industrial partners illustrates the potential usefulness of
V2S from the viewpoint of developers.Comment: In proceedings of the 42nd International Conference on Software
Engineering (ICSE'20), 13 page
Topic driven testing
Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stackâto the best of our knowledge a unique feature of our technique. Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can trans- fer on average sixty percent of the tested application behavior to new appsâwithout any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say âadd item to shopping cartâ, it tests the specified behavior in an applicationâboth in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frame- works, creates change resilient UI tests, and lays the foundation for learning, transfer- ring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.Moderne interaktive Anwendungen bieten so viele Interaktionsmöglichkeiten, dass eine vollstĂ€ndige automatische Exploration und das Testen aller Szenarien praktisch unmöglich ist. Stattdessen muss die Testprozedur auf relevante KernfunktionalitĂ€t ausgerichtet werden. Diese Arbeit stellt ein neues fundamentales Testprinzip genannt thematisches Testen vor, das beliebige Anwendungen u Ìber die graphische OberflĂ€che testet. Wir untersuchen die semantische Bedeutung von interagierbaren Elementen um die Kernfunktionenen von Anwendungen zu identifizieren und entsprechende Tests zu erzeugen. Statt typischen starren Testinstruktionen orientiert sich diese Art von Tests an menschlichen AnwendungsfĂ€llen in natĂŒrlicher Sprache. Dies erlaubt es, Software Spezifikationen zu erlernen und Wissen von einer Anwendung auf andere zu ĂŒbertragen unabhĂ€ngig von der Anwendungsart, der Programmiersprache, dem TestgerĂ€t oder der -Plattform. Nach unserem Kenntnisstand ist unser Ansatz der Erste dieser Art. Wir prĂ€sentieren ATTABOY, ein Programm, das eine existierende Testsammlung fĂŒr eine Webanwendung (z.B. fĂŒr Amazon) nimmt und in einer beliebigen anderen Anwendung (sagen wir ebay) ausfĂŒhrt. Dadurch werden Tests fĂŒr Kernfunktionen generiert. Bei der ersten AusfĂŒhrung auf Anwendungen aus den DomĂ€nen Online Shopping, Nachrichtenseiten und eMail, erzeugt der Prototyp sechzig Prozent der Tests automatisch. Ohne zusĂ€tzlichen manuellen Aufwand. DarĂŒber hinaus interpretiert themen- getriebenes Testen auch vage Anweisungen beispielsweise von How-to Anleitungen oder Anwendungsbeschreibungen. Eine Anweisung wie "FĂŒgen Sie das Produkt in den Warenkorb hinzu" testet das entsprechende Verhalten in der Anwendung. Sowohl im Browser, als auch in einer mobilen Anwendung. Die erzeugten Tests sind robuster und effektiver als vergleichbar erzeugte Tests. Der Prototyp testet die ZielfunktionalitĂ€t fĂŒnf mal schneller und testet dabei Funktionen die durch nicht spezialisierte AnsĂ€tze kaum zu erreichen sind
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Search-Based Test Generation Targeting Non-Functional Quality Attributes of Android Apps
Mobile apps form a major proportion of the software marketplace and it is crucial to ensure that they meet both functional and nonfunctional quality thresholds. Automated test input generation can reduce the cost of the testing process. However, existing Android test generation approaches are focused on code coverage and cannot be customized to a tester\u27s diverse goals---in particular, quality attributes such as resource use.We propose a flexible multi-objective search-based test generation framework for interface testing of Android apps---STGFA-SMOG. This framework allows testers to target a variety of fitness functions, corresponding to different software quality attributes, code coverage, and other test case properties. We find that STGFA-SMOG outperforms random test generation in exposing potential quality issues and triggering crashes. Our study also offers insights on how different combinations of fitness functions can affect test generation for Android apps
Owl Eyes: Spotting UI Display Issues via Visual Understanding
Graphical User Interface (GUI) provides a visual bridge between a software
application and end users, through which they can interact with each other.
With the development of technology and aesthetics, the visual effects of the
GUI are more and more attracting. However, such GUI complexity posts a great
challenge to the GUI implementation. According to our pilot study of
crowdtesting bug reports, display issues such as text overlap, blurred screen,
missing image always occur during GUI rendering on different devices due to the
software or hardware compatibility. They negatively influence the app
usability, resulting in poor user experience. To detect these issues, we
propose a novel approach, OwlEye, based on deep learning for modelling visual
information of the GUI screenshot. Therefore, OwlEye can detect GUIs with
display issues and also locate the detailed region of the issue in the given
GUI for guiding developers to fix the bug. We manually construct a large-scale
labelled dataset with 4,470 GUI screenshots with UI display issues and develop
a heuristics-based data augmentation method for boosting the performance of our
OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision
and 84% recall in detecting UI display issues, and 90% accuracy in localizing
these issues. We also evaluate OwlEye with popular Android apps on Google Play
and F-droid, and successfully uncover 57 previously-undetected UI display
issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated
Software Engineering (ASE 20
How do Developers Test Android Applications?
Enabling fully automated testing of mobile applications has recently become
an important topic of study for both researchers and practitioners. A plethora
of tools and approaches have been proposed to aid mobile developers both by
augmenting manual testing practices and by automating various parts of the
testing process. However, current approaches for automated testing fall short
in convincing developers about their benefits, leading to a majority of mobile
testing being performed manually. With the goal of helping researchers and
practitioners - who design approaches supporting mobile testing - to understand
developer's needs, we analyzed survey responses from 102 open source
contributors to Android projects about their practices when performing testing.
The survey focused on questions regarding practices and preferences of
developers/testers in-the-wild for (i) designing and generating test cases,
(ii) automated testing practices, and (iii) perceptions of quality metrics such
as code coverage for determining test quality. Analyzing the information
gleaned from this survey, we compile a body of knowledge to help guide
researchers and professionals toward tailoring new automated testing approaches
to the need of a diverse set of open source developers.Comment: 11 pages, accepted to the Proceedings of the 33rd IEEE International
Conference on Software Maintenance and Evolution (ICSME'17
Android Based Behavioral Biometric Authentication via Multi-Modal Fusion
Because mobile devices are easily lost or stolen, continuous authentication is extremely desirable for them. Behavioral biometrics provides non-intrusive continuous authentication that has much less impact on usability than active authentication. However single-modality behavioral biometrics has proven less accurate than standard active authentication. This thesis presents a behavioral biometric system that uses multi-modal fusion with user data from touch, keyboard, and orientation sensors. Testing of ve users shows that fusion of modalities provides more accurate authentication than each individual modalities by itself. Using the BayesNet classification algorithm, fusion achieves False Acceptance Rate (FAR) and False Rejection Rate (FRR) values of 9.65% and 2% respectively, each of which is 8% lower than the closest individual modality
- âŠ