In this paper we present a hybrid three steps mechanism for automated-human media analysis employed for selecting a small number of representative and diverse images in the context of a noisy set of images. The first step consists in the automatic retrieval from web of a large database of candidate images. In the second step, a proposed image analysis method is employed with the goal of diminishing the time, pay and cognitive load and implicitly people’s work. This is done by automatically selecting a set of potentially relevant and diverse images. Considering the semantic gap between low-level features and high-level semantics in images, the last step is necessary and consists in images being annotated and assessed by the crowd. The aim is to evaluate the level of representativeness and diversity of the selected set of images and providing images of highest quality. The method was validated in the context of the retrieval of images with monuments and using more than 30,000 images retrieved from various social image search platforms