As cloud services become central in an increasing number of applications,
they process and store more personal and business-critical data. At the same
time, privacy and compliance regulations such as GDPR, the EU ePrivacy
regulation, PCI, and the upcoming EU Cybersecurity Act raise the bar for secure
processing and traceability of critical data. Especially the demand to provide
information about existing data records of an individual and the ability to
delete them on demand is central in privacy regulations. Common to these
requirements is that cloud providers must be able to track data as it flows
across the different services to ensure that it never moves outside of the
legitimate realm, and it is known at all times where a specific copy of a
record that belongs to a specific individual or business process is located.
However, current cloud architectures do neither provide the means to
holistically track data flows across different services nor to enforce policies
on data flows. In this paper, we point out the deficits in the data flow
tracking functionalities of major cloud providers by means of a set of
practical experiments. We then generalize from these experiments introducing a
generic architecture that aims at solving the problem of cloud-wide data flow
tracking and show how it can be built in a Kubernetes-based prototype
implementation.Comment: 11 pages, 5 figures, 2020 IEEE 13th International Conference on Cloud
Computing (CLOUD