Smartphone apps usually have access to sensitive user data such as contacts,
geo-location, and account credentials and they might share such data to
external entities through the Internet or with other apps. Confidentiality of
user data could be breached if there are anomalies in the way sensitive data is
handled by an app which is vulnerable or malicious. Existing approaches that
detect anomalous sensitive data flows have limitations in terms of accuracy
because the definition of anomalous flows may differ for different apps with
different functionalities; it is normal for "Health" apps to share heart rate
information through the Internet but is anomalous for "Travel" apps.
In this paper, we propose a novel approach to detect anomalous sensitive data
flows in Android apps, with improved accuracy. To achieve this objective, we
first group trusted apps according to the topics inferred from their functional
descriptions. We then learn sensitive information flows with respect to each
group of trusted apps. For a given app under analysis, anomalies are identified
by comparing sensitive information flows in the app against those flows learned
from trusted apps grouped under the same topic. In the evaluation, information
flow is learned from 11,796 trusted apps. We then checked for anomalies in 596
new (benign) apps and identified 2 previously-unknown vulnerable apps related
to anomalous flows. We also analyzed 18 malware apps and found anomalies in 6
of them