Abstract. The prevalence of cloud computing environments and the ever increasing reliance of large organisations on computational resources has meant that service providers must operate at unprecedented scales and levels of efficiency. Dynamic resource allocation (DRA) policies have been shown to allow service providers to improve resource utilisation and operational efficiency in presence of unpredictable demands, hence maximising profitability. However, practical considerations, such as power and space, have led service providers to adopt rack based approaches to application servicing. This co-location of computation resources, and the associated common provision of utilities it encourages, has immediate implications for system dependability. Specifically, in the presence of rack crash failures which can lead to all the servers within a rack becoming unavailable, resource allocation policies need to be cognisant of failures. In this paper, we address this issue and make the following specific contributions: (i) we present a modular architecture for failure-aware resource allocation, where a performance- oriented DRA policy is composed with a failure-aware resource allocator, (ii) we propose a metric, called Capacity Loss, to capture the exposure of an application to a rack failure, (iii) we develop an algorithm for reducing the proposed metric across all applications in a system operating under a DRA policy, and (iv) we evaluate the effectiveness of the proposed architecture on a large-scale DRA policy in context of rack failures, ultimately concluding that our approach reduces the number of failed requests as compared to a single random allocation. The main benefit of our approach is that we have developed a failure-aware resource allocation framework that can work in tandem with any DRA policy.