21 research outputs found
An improved Gaussian Mixture Model with post-processing for multiple object detection in surveillance video analytics
Gaussian Mixture Model (GMM) is an effective method for extracting foreground objects from video sequences. However, GMM fails to detect the object in challenging scenarios like the presence of shadow, occlusion, complex backgrounds, etc. To handle these challenges, intrinsic and extrinsic enhancement is required in traditional GMM. This paper presents a novel framework that combines improved GMM with postprocessing for multiple object detection. In the proposed system, GMM with parameter initialization is considered an intrinsic improvement. Video preprocessing and postprocessing are considered extrinsic improvements. Integration of morphological operation with GMM helps for better segmentation than traditional GMM, and it also helps to increase detection performance by reducing false positives. Video preprocessing is the process of noise removal that prepares input video ready for further processing. In the final step gradient of morphological operations is used for postprocessing. The proposed approach was tested on challenging surveillance video sequences from benchmark datasets such as PETS 2009 and CD 2014(Change Detection). The experimental results are compared using ground truth and performance evaluation metrics. The results show that the proposed approach performs better than GMM, and the method can detect the object effectively even in illumination variation and partial occlusion
Hierarchical improvement of foreground segmentation masks in background subtraction
A plethora of algorithms have been defined for foreground
segmentation, a fundamental stage for many computer
vision applications. In this work, we propose a post-processing
framework to improve foreground segmentation performance of
background subtraction algorithms. We define a hierarchical
framework for extending segmented foreground pixels to undetected
foreground object areas and for removing erroneously
segmented foreground. Firstly, we create a motion-aware hierarchical
image segmentation of each frame that prevents merging
foreground and background image regions. Then, we estimate
the quality of the foreground mask through the fitness of the
binary regions in the mask and the hierarchy of segmented
regions. Finally, the improved foreground mask is obtained as
an optimal labeling by jointly exploiting foreground quality and
spatial color relations in a pixel-wise fully-connected Conditional
Random Field. Experiments are conducted over four large and
heterogeneous datasets with varied challenges (CDNET2014,
LASIESTA, SABS and BMC) demonstrating the capability of the
proposed framework to improve background subtraction resultsThis work was partially supported by the Spanish Government
(HAVideo, TEC2014-53176-R
Quality-Driven video analysis for the improvement of foreground segmentation
Tesis Doctoral inĂ©dita leĂda en la Universidad AutĂłnoma de Madrid, Escuela PolitĂ©cnica Superior, Departamento de TecnologĂa ElectrĂłnica y de las Comunicaciones.Fecha de lectura: 15-06-2018It was partially supported by the Spanish
Government (TEC2014-53176-R, HAVideo
Video foreground segmentation with deep learning
This thesis tackles the problem of foreground segmentation in videos, even under extremely challenging conditions. This task comes with a plethora of hurdles, as the model needs to distinguish the difference between moving objects and irrelevant background motion which can be caused by the weather, illumination, camera movement etc. As foreground segmentation is often the first step of various highly important applications (video surveillance for security, patient/infant monitoring etc.), it is crucial to develop a model capable of producing excellent results in all kinds of conditions.
In order to tackle this problem, we follow the recent trend in other computer vision areas and harness the power of deep learning. We design architectures of convolutional neural networks specifically targeted to counter the aforementioned challenges. We first propose a 3D CNN that models the spatial and temporal information of the scene simultaneously. The network is deep enough to successfully cover more than 50 different scenes of various conditions with no need for any fine-tuning. These conditions include illumination (day or night), weather (sunny, rainy or snowing), background movements (trees moving from the wind, fountains etc) and others. Next, we propose a data augmentation method specifically targeted to illumination changes. We show that artificially augmenting the data set with this method significantly improves the segmentation results, even when tested under sudden illumination changes. We also present a post-processing method that exploits the temporal information of the input video. Finally, we propose a complex deep learning model which learns the illumination of the scene and performs foreground segmentation simultaneously
Gesture tracking and neural activity segmentation in head-fixed behaving mice by deep learning methods
The typical approach used by neuroscientists is to study the response of laboratory animals to a stimulus while recording their neural activity at the same time. With the advent of calcium imaging technology, researchers can now study neural activity at sub-cellular resolutions in vivo. Similarly, recording the behaviour of laboratory animals is also becoming more affordable. Although it is now easier to record behavioural and neural data, this data comes with its own set of challenges. The biggest challenge, given the sheer volume of the data, is annotation. A traditional approach is to annotate the data manually, frame by frame. With behavioural data, manual annotation is done by looking at each frame and tracing the animals; with neural data, this is carried out by a trained neuroscientist. In this research, we propose automated tools based on deep learning that can aid in the processing of behavioural and neural data. These tools will help neuroscientists annotate and analyse the data they acquire in an automated and reliable way.La configuraciĂłn tĂpica empleada por los neurocientĂficos consiste en estudiar la respuesta de los animales de laboratorio a un estĂmulo y registrar al mismo tiempo su actividad neuronal. Con la llegada de la tecnologĂa de imĂĄgenes del calcio, los investigadores pueden ahora estudiar la actividad neuronal a resoluciones subcelulares in vivo. Del mismo modo, el registro del comportamiento de los animales de laboratorio tambiĂ©n se estĂĄ volviendo mĂĄs asequible. Aunque ahora es mĂĄs fĂĄcil registrar los datos del comportamiento y los datos neuronales, estos datos ofrecen su propio conjunto de desafĂos. El mayor desafĂo es la anotaciĂłn de los datos debido a su gran volumen. Un enfoque tradicional es anotar los datos manualmente, fotograma a fotograma. En el caso de los datos sobre el comportamiento, la anotaciĂłn manual se hace mirando cada fotograma y rastreando los animales, mientras que, para los datos neuronales, la anotaciĂłn la hace un neurocientĂfico capacitado. En esta investigaciĂłn, proponemos herramientas automatizadas basadas en el aprendizaje profundo que pueden ayudar a procesar los datos de comportamiento y los datos neuronales.La configuraciĂł tĂpica emprada pels neurocientĂfics consisteix a estudiar la resposta dels animals de laboratori a un estĂmul i registrar al mateix temps la seva activitat neuronal. Amb l'arribada de la tecnologia d'imatges basades en calci, els investigadors poden ara estudiar l'activitat neuronal a resolucions subcel·lulars in vivo. De la mateixa manera, el registre del comportament dels animals de laboratori tambĂ© ha esdevingut molt mĂ©s assequible. Tot i que ara Ă©s mĂ©s fĂ cil registrar les dades del comportament i les dades neuronals, aquestes dades ofereixen el seu propi conjunt de reptes. El major desafiament Ă©s l'anotaciĂł de les dades, degut al seu gran volum. Un enfocament tradicional Ă©s anotar les dades manualment, fotograma a fotograma. En el cas de les dades sobre el comportament, l'anotaciĂł manual es fa mirant cada fotograma i rastrejant els animals, mentre que per a les dades neuronals, l'anotaciĂł la fa un neurocientĂfic capacitat. En aquesta investigaciĂł, proposem eines automatitzades basades en laprenentatge profund que poden ajudar a modelar les dades de comportament i les dades neuronals
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the âdata delugeâ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to âbridgeâ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
A Methodology for Extracting Human Bodies from Still Images
Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them.
One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach