With the increased collection of medical data in digital format the use and reuse of this data is also increasing. This introduces new challenges in the selection, de-identification, storage and handling of the imaging data. When building large data collections for use in training and validation of machine learning, merely collecting a lot of data is not enough. It is essential that the quality of the data is be sufficient for the intended application in order to obtain valid results. This chapter will discuss the issue of data quality by looking at the process of curation of medical images and other related data and the different aspects that are involved in this when moving forward in the era of AI