With increasingly volatile market conditions and rapid product innovations,
operational decision-making for large-scale systems entails solving thousands
of problems with limited data. Data aggregation is proposed to combine the data
across problems to improve the decisions obtained by solving those problems
individually. We propose a novel cluster-based Shrunken-SAA approach that can
exploit the cluster structure among problems when implementing the data
aggregation approaches. We prove that, as the number of problems grows,
leveraging the given cluster structure among problems yields additional
benefits over the data aggregation approaches that neglect such structure. When
the cluster structure is unknown, we show that unveiling the cluster structure,
even at the cost of a few data points, can be beneficial, especially when the
distance between clusters of problems is substantial. Our proposed approach can
be extended to general cost functions under mild conditions. When the number of
problems gets large, the optimality gap of our proposed approach decreases
exponentially in the distance between the clusters. We explore the performance
of the proposed approach through the application of managing newsvendor systems
via numerical experiments. We investigate the impacts of distance metrics
between problem instances on the performance of the cluster-based Shrunken-SAA
approach with synthetic data. We further validate our proposed approach with
real data and highlight the advantages of cluster-based data aggregation,
especially in the small-data large-scale regime, compared to the existing
approaches