4 research outputs found
Video foreground segmentation with deep learning
This thesis tackles the problem of foreground segmentation in videos, even under extremely challenging conditions. This task comes with a plethora of hurdles, as the model needs to distinguish the difference between moving objects and irrelevant background motion which can be caused by the weather, illumination, camera movement etc. As foreground segmentation is often the first step of various highly important applications (video surveillance for security, patient/infant monitoring etc.), it is crucial to develop a model capable of producing excellent results in all kinds of conditions.
In order to tackle this problem, we follow the recent trend in other computer vision areas and harness the power of deep learning. We design architectures of convolutional neural networks specifically targeted to counter the aforementioned challenges. We first propose a 3D CNN that models the spatial and temporal information of the scene simultaneously. The network is deep enough to successfully cover more than 50 different scenes of various conditions with no need for any fine-tuning. These conditions include illumination (day or night), weather (sunny, rainy or snowing), background movements (trees moving from the wind, fountains etc) and others. Next, we propose a data augmentation method specifically targeted to illumination changes. We show that artificially augmenting the data set with this method significantly improves the segmentation results, even when tested under sudden illumination changes. We also present a post-processing method that exploits the temporal information of the input video. Finally, we propose a complex deep learning model which learns the illumination of the scene and performs foreground segmentation simultaneously
Unifying Person and Vehicle Re-Identification
Person and vehicle re-identification (re-ID) are important challenges for the analysis of the burgeoning collection of urban surveillance videos. To efficiently evaluate such videos, which are populated with both vehicles and pedestrians, it would be preferable to have one unified framework with effective performance across both domains. Unfortunately, due to the contrasting composition of humans and vehicles, no architecture has yet been established that can adequately perform both tasks. We release a Person and Vehicle Unified Data Set (PVUD) comprising of both pedestrians and vehicles from popular existing re-ID data sets, in order to better model the data that we would expect to find in the real world. We exploit the generalisation ability of metric learning to propose a re-ID framework that can learn to re-identify humans and vehicles simultaneously. We design our network, MidTriNet, to harness the power of mid-level features to develop better representations for the re-ID tasks. We help the system to handle mixed data by appending unification terms with additional hard negative and hard positive mining to MidTriNet. We attain comparable accuracy training on PVUD to training on the comprising data sets separately, supporting the system's generalisation power. To further demonstrate the effectiveness of our framework, we also obtain results better than, or competitive with, the state-of-the-art on each of the Market-1501, CUHK03, VehicleID and VeRi data sets
Towards Explainable Abnormal Infant Movements Identification: A Body-part Based Prediction and Visualisation Framework
Providing early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious.
In this paper, we propose a new framework for the automated classification of infant body movements, based upon the GMA, which unlike previous methods, also incorporates a visualization framework to aid with interpretability. Our proposed framework segments extracted features to detect the presence of Fidgety Movements (FMs) associated with the GMA spatiotemporally. These features are then used to identify the body-parts with the greatest contribution towards a classification decision and highlight the related body-part segment providing visual feedback to the user.
We quantitatively compare the proposed framework's classification performance with several other methods from the literature and qualitatively evaluate the visualization's veracity. Our experimental results show that the proposed method performs more robustly than comparable techniques in this setting whilst simultaneously providing relevant visual interpretability
Image Editing based Data Augmentation for Illumination-insensitive Background Subtraction
A core challenge in background subtraction (BGS) is handling videos with sudden illumination changes in consecutive frames. While the use of data augmentation has been shown to increase robustness, the modelling of realistic illumination changes remains less explored and is usually limited to global, static brightness adjustments. In this paper, we focus on tackling the problem of background subtraction using augmented training data, and propose an augmentation method which vastly improves the model's performance under challenging illumination conditions. In particular, our framework consists of a local component that considers direct light/shadow and lighting angles, and a global component that considers the overall contrast, sharpness and color saturation of the image. It generates realistic, structured training data with different illumination conditions, enabling our deep learning system to be trained effectively for background subtraction even when significant illumination changes take place. We further propose a post-processing method that removes noise from the output binary map of segmentation, resulting in a cleaner, more accurate segmentation map that can generalise to multiple scenes of different conditions. Experimental results demonstrate that the proposed system outperforms existing work, with the highest F-measure score of 81.27% obtained by the full system. To facilitate the research in the field, we open the source code of this project at: https://github.com/dksakkos/illumination_augmentatio