Cascades are a common type of machine learning systems in which a large,
remote model can be queried if a local model is not able to accurately label a
user's data by itself. Serving stacks for large language models (LLMs)
increasingly use cascades due to their ability to preserve task performance
while dramatically reducing inference costs. However, applying cascade systems
in situations where the local model has access to sensitive data constitutes a
significant privacy risk for users since such data could be forwarded to the
remote model. In this work, we show the feasibility of applying cascade systems
in such setups by equipping the local model with privacy-preserving techniques
that reduce the risk of leaking private information when querying the remote
model. To quantify information leakage in such setups, we introduce two privacy
measures. We then propose a system that leverages the recently introduced
social learning paradigm in which LLMs collaboratively learn from each other by
exchanging natural language. Using this paradigm, we demonstrate on several
datasets that our methods minimize the privacy loss while at the same time
improving task performance compared to a non-cascade baseline