research
Water Across Synthetic Aperture Radar Data (WASARD): SAR Water Body Classification for the Open Data Cube
- Publication date
- Publisher
Abstract
The detection of inland water bodies from Synthetic Aperture Radar (SAR) data provides a great advantage over water detection with optical data, since SAR imaging is not impeded by cloud cover. Traditional methods of detecting water from SAR data involves using thresholding methods that can be labor intensive and imprecise. This paper describes Water Across Synthetic Aperture Radar Data (WASARD): a method of water detection from SAR data which automates and simplifies the thresholding process using machine learning on training data created from Geoscience Australias WOFS algorithm. Of the machine learning models tested, the Linear Support Vector Machine was determined to be optimal, with the option of training using solely the VH polarization or a combination of the VH and VV polarizations. WASARD was able to identify water in the target area with a correlation of 97% with WOFS.
Sentinel-1, Open Data Cube, Earth Observations, Machine Learning, Water Detection
1. INTRODUCTION
Water classification is an important function of Earth imaging satellites, as accurate remote classification of land and water can assist in land use analysis, flood prediction, climate change research, as well as a variety of agricultural applications [2]. The ability to identify bodies of water remotely via satellite is immensely cheaper than contracting surveys of the areas in question, meaning that an application that can accurately use satellite data towards this function can make valuable information available to nations which would not be able to afford it otherwise.
Highly reliable applications for the remote detection of water currently exist for use with optical satellite data such as that provided by LANDSAT. One such application, Geoscience Australias Water Observations from Space (WOFS) has already been ported for use with the Open Data Cube [6]. However, water detection using optical data from Landsat is constrained by its relatively long revisit cycle of 16 days [5], and water detection using any optical data is constrained in that it lacks the ability to make accurate classifications through cloud cover [2]. The alternative solution which solves these problems is water detection using SAR data, which images the Earth using cloud-penetrating microwaves.
Because of its advantages over optical data, much research has been done into water detection using SAR data. Traditionally, this has been done using the thresholding method, which involves picking a polarization band and labeling all pixels for which this bands value is below a certain threshold as containing water. The thresholding method works since water tends to return a much lower backscatter value to the satellite than land [1]. However, this method can be flawed since estimating the proper threshold is often imprecise, complicated, and labor intensive for the end user. Thresholding also tends to use data from only one SAR polarization, when a combination of polarizations can provide insight into whether water is present. [2]
In order to alleviate these problems, this paper presents an application for the Open Data Cube to detect water from SAR data using support vector machine (SVM) classification.
2. PLATFORM
WASARD is an application for the Open Data Cube, a mechanism which provides a simple yet efficient means of ingesting, storing, and retrieving remote sensing data. Data can be ingested and made analysis ready according to whatever specifications the researcher chooses, and easily resampled to artificially alter a scenes resolution. Currently WASARD supports water detection on scenes from ESAs Sentinel-1 and JAXAs ALOS. When testing WASARD, Sentinel-1 was most commonly used due to its relatively high spatial resolution and its rapid 6 day revisit cycle [5]. With minor alterations to the application's code, however, it could support data from other satellites.
3. METHODOLOGY
Using supervised classification, WASARD compares SAR data to a dataset pre-classified by WOFS in order to train an SVM classifier. This classifier is then used to detect water in other SAR scenes outside the training set. Accuracy was measured according to the following metrics:
Precision: a measure of what percentage of the points WASARD labels as water are truly water
Recall: a measure of what percentage of the total water cover WASARD was able to identify.
F1 Score: a harmonic average of the precision and recall scores
Both precision and recall are calculated at the end of the training phase, when the trained classifier is compared to a testing dataset. Because the WOFS algorithms classifications are used as the truth values when training a WASARD classifier, when precision and recall are mentioned in this paper, they are always with respect to the values produced by WOFS on a similar scene of Landsat data, which themselves have a classification accuracy of 97% [6]. Visual representations of water identified by WASARD in
this paper were produced using the function wasard_plot(),
which is included in WASARD.
3.1 Algorithm Selection
The machine learning model used by WASARD is the
Linear Support Vector Machine (SVM). This model uses a
supervised learning algorithm to develop a classifier,
meaning it creates a vector which can be multiplied by the
vector formed by the relevant data bands to determine
whether a pixel in a SAR scene contains water. This
classifier is trained by comparing data points from selected
bands in a SAR scene to their respective labels, which in this
case are water or not water as given by the WOFS
algorithm. The SVM was selected over the Random Forest
model, which outperformed the SVM in training speed, but
had a greater classification time and lower accuracy, and the
Multilayer Perceptron Artificial Neural Network, which had
a slightly higher average accuracy than the SVM, but much
greater training and classification times.
Figure 1: Visual representation of the SVM Classifier.
Each white point represents a pixel in a SAR scene.
In Figure 1, the diagonal line separating pixels
determined to be water from those determined not to be
water represents the actual classification vector produced by
the SVM. It is worth noting that once the model has been
trained, classification of pixels is done in a similar manner
as in the thresholding method. This is especially true if only
one band was used to train the model.
3.1 Feature Selection
Sentinel-1 collects data from two bands: the
Vertical/Vertical polarization (VV) and the
Vertical/Horizontal polarization (VH). When 100 SVM
classifiers were created for each polarization individually,
and for the combination of the two, the following results
were achieved:
Figure 2: Accuracy of classifiers trained using different
polarization bands. Precision and Recall were measured
with respect to the values produced by WOFS.
Figure 2 demonstrates that using both the VV and VH
bands trades slightly lower recall for significantly greater
precision when compared with the VH band alone, and that
using the VV band alone is inferior in both metrics.
WASARD therefore defaults to using both the VV and VH
bands, and includes the option to use solely the VH band.
The VV polarizations lower precision compared to the VH
polarization is in contrast to results from previous research
and may merit further analysis [4].
3.2 Training a Classifier
The steps in training a classifier with WASARD are
1. Selecting two scenes (one SAR, one optical) with
the same spatial extents, and acquired close to
each other in time, with a preference that the
scenes are taken on the same day.
2. Using the WOFS algorithm to produce an array of
the detected water in the scene of optical data, to
be used as the labels during supervised learning
3. Data points from the selected bands from the SAR
acquisition are bundled together into an array with
the corresponding labels gathered from WOFS. A
random sample with an equal number of points
labeled Water and Not Water is selected to be
partitioned into a training and a testing dataset
4. Using Scikit-Learns LinearSVC object, the
training dataset is used to produce a classifier,
which is then tested against the testing dataset to
determine its precision and recall
The result is a wasard_classifier object, which has the
following attributes:
1. f1, recall, and precision: 3 metrics used to
determine the classifiers accuracy
2. Coefficient: Vector which the SVM uses to make
its predictions. The classifier detects water when
the dot product of the coefficient and the vector
formed by the SAR bands is positive
3. Save(): allows a user to save a classifier to the disk
in order to use it without retraining
4. wasard_classify(): Classifies an entire xarray of
SAR data using the SVM classifier
All of the above steps are performed automatically
when the user creates a wasard_classifier object.
3.3 Classifying a Dataset
Once the classifier has been created, it can be used to detect
water in an xarray of SAR data using wasard_classify(). By
taking the dot product of the classifiers coefficients and the
vector formed by the selected bands of SAR data, an array
of predictions is constructed. A classifier can effectively be
used on the same spatial extents as the ones where it was
trained, or on any area with a similar landscape. Whil