Data retrieval systems such as online search engines and online social
networks must comply with the privacy policies of personal and selectively
shared data items, regulatory policies regarding data retention and censorship,
and the provider's own policies regarding data use. Enforcing these policies is
difficult and error-prone. Systematic techniques to enforce policies are either
limited to type-based policies that apply uniformly to all data of the same
type, or incur significant runtime overhead.
This paper presents Shai, the first system that systematically enforces
data-specific policies with near-zero overhead in the common case. Shai's key
idea is to push as many policy checks as possible to an offline, ahead-of-time
analysis phase, often relying on predicted values of runtime parameters such as
the state of access control lists or connected users' attributes. Runtime
interception is used sparingly, only to verify these predictions and to make
any remaining policy checks. Our prototype implementation relies on efficient,
modern OS primitives for sandboxing and isolation. We present the design of
Shai and quantify its overheads on an experimental data indexing and search
pipeline based on the popular search engine Apache Lucene