2 research outputs found
Sequence Information Channel Concatenation for Improving Camera Trap Image Burst Classification
Camera Traps are extensively used to observe wildlife in their natural
habitat without disturbing the ecosystem. This could help in the early
detection of natural or human threats to animals, and help towards ecological
conservation. Currently, a massive number of such camera traps have been
deployed at various ecological conservation areas around the world, collecting
data for decades, thereby requiring automation to detect images containing
animals. Existing systems perform classification to detect if images contain
animals by considering a single image. However, due to challenging scenes with
animals camouflaged in their natural habitat, it sometimes becomes difficult to
identify the presence of animals from merely a single image. We hypothesize
that a short burst of images instead of a single image, assuming that the
animal moves, makes it much easier for a human as well as a machine to detect
the presence of animals. In this work, we explore a variety of approaches, and
measure the impact of using short image sequences (burst of 3 images) on
improving the camera trap image classification. We show that concatenating
masks containing sequence information and the images from the 3-image-burst
across channels, improves the ROC AUC by 20% on a test-set from unseen
camera-sites, as compared to an equivalent model that learns from a single
image.Comment: 8 pages, 4 figures, 2 tables. Git repository can be found at:
https://github.com/bhuvi3/camera_trap_animal_classificatio
Improving robustness of image recognition through artificial image augmentation
Deep learning based computer vision technologies can offer a number of advantages over manual labour inspection methods such as reduced operational costs and efficiency improvements. However, they are known to be unreliable in certain situations, especially when input images contain augmentations such as occlusion or distortion that computer vision models have not been trained on. While augmentations can be mitigated by controlling some situations, this is not always possible, especially in outdoor environments.
To address this issue, one common approach is supplemental robustness training using augmented training data, which involves training models on images containing the expected augmentations to improve performance. However, this approach requires collection of a substantial volume of augmented images for each expected augmentation, making it time-consuming and costly depending on the difficulty involved in reproducing each augmentation.
This thesis explores the viability of using artificially rendered augmentations on unaugmented images as a substitute for the manual collection and preparation of naturally augmented data for image recognition and object detection models. Specifically, this thesis recreates nine environmental augmentations that commonly occur within outdoor environments and evaluates their impact on model performance on three datasets.
The findings of this thesis indicate potential for using artificially generated augmentations as substitutes for naturally occurring augmentations. It is anticipated that further research in this area will enable more reliable image recognition and object detection in less controllable environments, thus improving the results of these technologies in uncertain situations