2 research outputs found
Detecting Data Leakage from Databases on Android Apps with Concept Drift
Mobile databases are the statutory backbones of many applications on
smartphones, and they store a lot of sensitive information. However,
vulnerabilities in the operating system or the app logic can lead to sensitive
data leakage by giving the adversaries unauthorized access to the app's
database. In this paper, we study such vulnerabilities to define a threat
model, and we propose an OS-version independent protection mechanism that app
developers can utilize to detect such attacks. To do so, we model the user
behavior with the database query workload created by the original apps. Here,
we model the drift in behavior by comparing probability distributions of the
query workload features over time. We then use this model to determine if the
app behavior drift is anomalous. We evaluate our framework on real-world
workloads of three different popular Android apps, and we show that our system
was able to detect more than 90% of such attacks.Comment: This paper is accepted to be published in the proceedings of IEEE
TrustCom 201
Query Log Compression for Workload Analytics
Analyzing database access logs is a key part of performance tuning, intrusion
detection, benchmark development, and many other database administration tasks.
Unfortunately, it is common for production databases to deal with millions or
even more queries each day, so these logs must be summarized before they can be
used. Designing an appropriate summary encoding requires trading off between
conciseness and information content. For example: simple workload sampling may
miss rare, but high impact queries. In this paper, we present LogR, a lossy log
compression scheme suitable use for many automated log analytics tools, as well
as for human inspection. We formalize and analyze the space/fidelity trade-off
in the context of a broader family of "pattern" and "pattern mixture" log
encodings to which LogR belongs. We show through a series of experiments that
LogR compressed encodings can be created efficiently, come with provable
information-theoretic bounds on their accuracy, and outperform state-of-art log
summarization strategies.Comment: Typos fixed, some irrelevant figures and paragraphs are trimme