Automatic Identification of Algae using Low-cost Multispectral Fluorescence Digital Microscopy, Hierarchical Classification & Deep Learning

Abstract

Harmful algae blooms (HABs) can produce lethal toxins and are a rising global concern. In response to this threat, many organizations are monitoring algae populations to determine if a water body might be contaminated. However, identifying algae types in a water sample requires a human expert, a taxonomist, to manually identify organisms using an optical microscope. This is a tedious, time-consuming process that is prone to human error and bias. Since many facilities lack on-site taxonomists, they must ship their water samples off site, further adding to the analysis time. Given the urgency of this problem, this thesis hypothesizes that multispectral fluorescence microscopy with a deep learning hierarchical classification structure is the optimal method to automatically identify algae in water on-site. To test this hypothesis, a low-cost system was designed and built which was able generate one brightfield image and four fluorescence images. Each of the four fluorescence images was designed to target a different pigment in algae, resulting in a unique autofluorescence spectral fingerprint for different phyla groups. To complement this hardware system, a software framework was designed and developed. This framework used the prior taxonomic structure of algae to create a hierarchical classification structure. This hierarchical classifier divided the classification task into three steps which were phylum, genus, and species level classification. Deep learning models were used at each branch of this hierarchical classifier allowing the optimal set of features to be implicitly learned from the input data. In order to test the efficacy of the proposed hardware system and corresponding software framework, a dataset of nine algae from 4 different phyla groups was created. A number of preprocessing steps were required to prepare the data for analysis. These steps were flat field correction, thresholding and cropping. With this multispectral imaging data, a number of spatial and spectral features were extracted for use in the feature-extraction-based models. This dataset was used to determine the relative performance of 12 different model architectures, and the proposed multispectral hierarchical deep learning approach achieved the top classification accuracy of 97% to the species level. Further inspection revealed that a traditional feature extraction method was able to achieve 95% to the phyla level when only using the multispectral fluorescence data. These observations strongly support that: (1) the proposed low-cost multispectral fluorescence imaging system, and (2) the proposed hierarchical structure based on the taxonomy prior, in combination with (3) deep learning methods for feature learning, is an effective method to automatically classify algae

    Similar works