The aim of this paper is to show how citizen science could potentially be used to improve cloud detection algorithms for satellite imagery in the future. This work is being undertaken within the framework of the H2020-funded LandSense Citizen Observatory on land cover and land use. One of key areas of interest in LandSense is making best use of data from Earth Observation. A number of different cloud detection systems and algorithms are available, including Sentinel Hub’s CD (s2cloudless) system developed by Sinergise. This system has high cloud detection rates and lower misclassification rates of land and snow as clouds compared to other popular cloud detection algorithms. However, the cloud detector could still be improved, e.g., detection over bare areas. Hence augmenting the training data set with additional data, including misclassified pixels from certain scenes, is one way to improve the classifier performance.
Although Sinergise already has a sophisticated system for collection of training and validation data for the cloud detection system, a more simplified approach for data collection could run in parallel using a citizen-science based application such as Picture Pile. This is a mobile and online tool for rapid image assessment that is part of the LandSense Citizen Observatory. Picture Pile (shown in Figure 1) works in a very simple way. Users are provided with a satellite image and are then asked one question. In this example, the question is: Is more than half of the image cloudy? If the answer is yes, the user swipes the image to the right on mobile devices (or uses the cursor keys on the browser version). If the answer is no, the image is swiped to the left, and if users are unsure, they can swipe the image downwards. In this way it is possible to classify images very rapidly. At present the application does not ask users about cloud shadows but future questions could be added to the application that do. Also users could be trained to distinguish between clouds and cloud shadows. Picture Pile has some similarities to other applications, e.g., Cerberus, which is an online game for land cover mapping of continuous areas and includes the mapping of clouds as one feature type. However, this is online game with missions rather than a rapid image assessment tool. Another similar application is Missing Map’s MapSwipe, which shows users a continuous area rather than a sample of images. The user swipes through this continuous landscape looking for features of interest, tapping once when a feature is found, twice if maybe and three times for cloud covered images. Hence the data from both Cerberurs and MapSwipe could potentially be used as additional inputs to the cloud detection algorithm. The Google captcha tool for image annotation could also be used to collect data on clouds if Google chose this as one of their image annotation tasks but it would be highly unlikely that we could obtain these data from Google.
To date, the first pile of 27,021 unique images was classified 271,523 times in Picture Pile by 82 volunteers. The campaign was advertised through Twitter and via emails to previous campaign participants and via the wider Geo-Wiki and LandSense networks. In this instance, there was no need to advertise further as we have established a volunteer group interested in using the application. The quality control procedure in Picture Pile has two elements. The first is to give the same images to more than one volunteer. In this case, 92.6% of the images were classified by more 1 person, with some images classified up to 9 times by different volunteers. Of the images sorted, 61% of the images had complete agreement, 35.5% had some disagreement, in which case we use a majority rule, and 3.5% had some answers in the ‘maybe’ category, indicating that they were difficult to classify. Hence these images will either be checked manually by experts or deemed unusable for classification purposes. The second element of the quality control is control points, i.e., a subset of images classified by experts, which are randomly shown to the volunteers. When the volunteers make mistakes, they lose points, which encourages them to interpret the images carefully. These types of gamification elements have proven to be successful in previous Picture Pile campaigns.
Having managed to mobilize a crowd to complete this task, the next stage of the process would be to improve the data set needed by Sinergise’s s2cloudless system. This could either involve modifying the task in Picture Pile, by asking participants to identify areas as cloud, non-cloud or partial-cloud, e.g. 9 pixels from Sentinel 1 shown as a 30 m block. Another approach would be to use these results from Picture Pile in another application called Picture Paint, in which the users would delineate areas of cloud on the images. This would make the data more useful as training data for the s2cloudless system. These efforts are ongoing.
Picture Pile is also being employed in other use cases within the LandSense Citizen Observatory, e.g., in validating night time lights imagery, for identification of oil palm, and rapid assessment of damage after a natural disaster. The browser version of the Picture Pile application can be found at the following link: https://geo-wiki.org/games/picturepile/, while the mobile application can be downloaded from the Google PlayStore and Apple AppStore