2 research outputs found
Protecting Time Series Data with Minimal Forecast Loss
Forecasting could be negatively impacted due to anonymization requirements in
data protection legislation. To measure the potential severity of this problem,
we derive theoretical bounds for the loss to forecasts from additive
exponential smoothing models using protected data. Following the guidelines of
anonymization from the General Data Protection Regulation (GDPR) and California
Consumer Privacy Act (CCPA), we develop the -nearest Time Series (-nTS)
Swapping and -means Time Series (-mTS) Shuffling methods to create
protected time series data that minimizes the loss to forecasts while
preventing a data intruder from detecting privacy issues. For efficient and
effective decision making, we formally model an integer programming problem for
a perfect matching for simultaneous data swapping in each cluster. We call it a
two-party data privacy framework since our optimization model includes the
utilities of a data provider and data intruder. We apply our data protection
methods to thousands of time series and find that it maintains the forecasts
and patterns (level, trend, and seasonality) of time series well compared to
standard data protection methods suggested in legislation. Substantively, our
paper addresses the challenge of protecting time series data when used for
forecasting. Our findings suggest the managerial importance of incorporating
the concerns of forecasters into the data protection itself