research

Water Across Synthetic Aperture Radar Data (WASARD): SAR Water Body Classification for the Open Data Cube

Abstract

The detection of inland water bodies from Synthetic Aperture Radar (SAR) data provides a great advantage over water detection with optical data, since SAR imaging is not impeded by cloud cover. Traditional methods of detecting water from SAR data involves using thresholding methods that can be labor intensive and imprecise. This paper describes Water Across Synthetic Aperture Radar Data (WASARD): a method of water detection from SAR data which automates and simplifies the thresholding process using machine learning on training data created from Geoscience Australias WOFS algorithm. Of the machine learning models tested, the Linear Support Vector Machine was determined to be optimal, with the option of training using solely the VH polarization or a combination of the VH and VV polarizations. WASARD was able to identify water in the target area with a correlation of 97% with WOFS. Sentinel-1, Open Data Cube, Earth Observations, Machine Learning, Water Detection 1. INTRODUCTION Water classification is an important function of Earth imaging satellites, as accurate remote classification of land and water can assist in land use analysis, flood prediction, climate change research, as well as a variety of agricultural applications [2]. The ability to identify bodies of water remotely via satellite is immensely cheaper than contracting surveys of the areas in question, meaning that an application that can accurately use satellite data towards this function can make valuable information available to nations which would not be able to afford it otherwise. Highly reliable applications for the remote detection of water currently exist for use with optical satellite data such as that provided by LANDSAT. One such application, Geoscience Australias Water Observations from Space (WOFS) has already been ported for use with the Open Data Cube [6]. However, water detection using optical data from Landsat is constrained by its relatively long revisit cycle of 16 days [5], and water detection using any optical data is constrained in that it lacks the ability to make accurate classifications through cloud cover [2]. The alternative solution which solves these problems is water detection using SAR data, which images the Earth using cloud-penetrating microwaves. Because of its advantages over optical data, much research has been done into water detection using SAR data. Traditionally, this has been done using the thresholding method, which involves picking a polarization band and labeling all pixels for which this bands value is below a certain threshold as containing water. The thresholding method works since water tends to return a much lower backscatter value to the satellite than land [1]. However, this method can be flawed since estimating the proper threshold is often imprecise, complicated, and labor intensive for the end user. Thresholding also tends to use data from only one SAR polarization, when a combination of polarizations can provide insight into whether water is present. [2] In order to alleviate these problems, this paper presents an application for the Open Data Cube to detect water from SAR data using support vector machine (SVM) classification. 2. PLATFORM WASARD is an application for the Open Data Cube, a mechanism which provides a simple yet efficient means of ingesting, storing, and retrieving remote sensing data. Data can be ingested and made analysis ready according to whatever specifications the researcher chooses, and easily resampled to artificially alter a scenes resolution. Currently WASARD supports water detection on scenes from ESAs Sentinel-1 and JAXAs ALOS. When testing WASARD, Sentinel-1 was most commonly used due to its relatively high spatial resolution and its rapid 6 day revisit cycle [5]. With minor alterations to the application's code, however, it could support data from other satellites. 3. METHODOLOGY Using supervised classification, WASARD compares SAR data to a dataset pre-classified by WOFS in order to train an SVM classifier. This classifier is then used to detect water in other SAR scenes outside the training set. Accuracy was measured according to the following metrics: Precision: a measure of what percentage of the points WASARD labels as water are truly water Recall: a measure of what percentage of the total water cover WASARD was able to identify. F1 Score: a harmonic average of the precision and recall scores Both precision and recall are calculated at the end of the training phase, when the trained classifier is compared to a testing dataset. Because the WOFS algorithms classifications are used as the truth values when training a WASARD classifier, when precision and recall are mentioned in this paper, they are always with respect to the values produced by WOFS on a similar scene of Landsat data, which themselves have a classification accuracy of 97% [6]. Visual representations of water identified by WASARD in this paper were produced using the function wasard_plot(), which is included in WASARD. 3.1 Algorithm Selection The machine learning model used by WASARD is the Linear Support Vector Machine (SVM). This model uses a supervised learning algorithm to develop a classifier, meaning it creates a vector which can be multiplied by the vector formed by the relevant data bands to determine whether a pixel in a SAR scene contains water. This classifier is trained by comparing data points from selected bands in a SAR scene to their respective labels, which in this case are water or not water as given by the WOFS algorithm. The SVM was selected over the Random Forest model, which outperformed the SVM in training speed, but had a greater classification time and lower accuracy, and the Multilayer Perceptron Artificial Neural Network, which had a slightly higher average accuracy than the SVM, but much greater training and classification times. Figure 1: Visual representation of the SVM Classifier. Each white point represents a pixel in a SAR scene. In Figure 1, the diagonal line separating pixels determined to be water from those determined not to be water represents the actual classification vector produced by the SVM. It is worth noting that once the model has been trained, classification of pixels is done in a similar manner as in the thresholding method. This is especially true if only one band was used to train the model. 3.1 Feature Selection Sentinel-1 collects data from two bands: the Vertical/Vertical polarization (VV) and the Vertical/Horizontal polarization (VH). When 100 SVM classifiers were created for each polarization individually, and for the combination of the two, the following results were achieved: Figure 2: Accuracy of classifiers trained using different polarization bands. Precision and Recall were measured with respect to the values produced by WOFS. Figure 2 demonstrates that using both the VV and VH bands trades slightly lower recall for significantly greater precision when compared with the VH band alone, and that using the VV band alone is inferior in both metrics. WASARD therefore defaults to using both the VV and VH bands, and includes the option to use solely the VH band. The VV polarizations lower precision compared to the VH polarization is in contrast to results from previous research and may merit further analysis [4]. 3.2 Training a Classifier The steps in training a classifier with WASARD are 1. Selecting two scenes (one SAR, one optical) with the same spatial extents, and acquired close to each other in time, with a preference that the scenes are taken on the same day. 2. Using the WOFS algorithm to produce an array of the detected water in the scene of optical data, to be used as the labels during supervised learning 3. Data points from the selected bands from the SAR acquisition are bundled together into an array with the corresponding labels gathered from WOFS. A random sample with an equal number of points labeled Water and Not Water is selected to be partitioned into a training and a testing dataset 4. Using Scikit-Learns LinearSVC object, the training dataset is used to produce a classifier, which is then tested against the testing dataset to determine its precision and recall The result is a wasard_classifier object, which has the following attributes: 1. f1, recall, and precision: 3 metrics used to determine the classifiers accuracy 2. Coefficient: Vector which the SVM uses to make its predictions. The classifier detects water when the dot product of the coefficient and the vector formed by the SAR bands is positive 3. Save(): allows a user to save a classifier to the disk in order to use it without retraining 4. wasard_classify(): Classifies an entire xarray of SAR data using the SVM classifier All of the above steps are performed automatically when the user creates a wasard_classifier object. 3.3 Classifying a Dataset Once the classifier has been created, it can be used to detect water in an xarray of SAR data using wasard_classify(). By taking the dot product of the classifiers coefficients and the vector formed by the selected bands of SAR data, an array of predictions is constructed. A classifier can effectively be used on the same spatial extents as the ones where it was trained, or on any area with a similar landscape. Whil

    Similar works