Search CORE

75 research outputs found

Foreground detection by probabilistic modeling of the features discovered by stacked denoising autoencoders in noisy video sequences

Author: García-González Jorge
Luque-Baena Rafael Marcos
López-Rubio Ezequiel
Molina-Cabello Miguel Ángel
Ortiz-de-Lazcano-Lobato Juan Miguel
Publication venue
Publication date: 07/06/2019
Field of study

Licencia CC BY-NC-ND. Versión definitiva disponible en el DOI indicado. García-González, J., Ortiz-de-Lazcano-Lobato, J. M., Luque-Baena, R. M., Molina-Cabello, M. A., & López-Rubio, E. (2019). Foreground detection by probabilistic modeling of the features discovered by stacked denoising autoencoders in noisy video sequences. Pattern Recognition Letters, 125, 481-487.A robust foreground detection system is presented, which is resilient to noise in video sequences. The proposed model divides each video frame in patches that are fed to a stacked denoising autoencoder, which is responsible for the extraction of significant features from each image patch. After that, a probabilistic model that is composed of a mixture of Gaussian distributions decides whether the given feature vector describes a patch belonging to the background or the foreground. In order to test the model robustness, several trials with noise of different types and intensities have been carried out. A comparison with other ten state of the art foreground detection algorithms has been drawn. The algorithms have been ranked according to the obtained results, and our proposal appears among the first three positions in most case and its the one that best performs on average.This work is partially supported by the Ministry of Science, Innovation and Universities of Spain [grant number RTI2018-094645-B-I00], project name Automated detection with low cost hardware of unusual activities in video sequences. It is also partially supported by the Autonomous Government of Andalusia (Spain) [grant number TIC-657], project name Self-organizing systems and robust estimators for video surveillance. Both of them include funds from the European Regional Development Fund (ERDF). The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of Málaga. They have also been supported by the Biomedic Research Institute of Málaga (IBIMA). They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs

Repositorio Institucional Universidad de Málaga

Background modeling by shifted tilings of stacked denoising autoencoders

Author: A Elgammal
A Torralba
Andrews Sobral
C Wren
E López-Rubio
E López-Rubio
H Robbins
L Maddalena
P Baldi
P St-Charles
P Vincent
Y Zhang
Z Zivkovic
Publication venue
Publication date: 18/06/2019
Field of study

The effective processing of visual data without interruption is currently of supreme importance. For that purpose, the analysis system must adapt to events that may affect the data quality and maintain its performance level over time. A methodology for background modeling and foreground detection, whose main characteristic is its robustness against stationary noise, is presented in the paper. The system is based on a stacked denoising autoencoder which extracts a set of significant features for each patch of several shifted tilings of the video frame. A probabilistic model for each patch is learned. The distinct patches which include a particular pixel are considered for that pixel classification. The experiments show that classical methods existing in the literature experience drastic performance drops when noise is present in the video sequences, whereas the proposed one seems to be slightly affected. This fact corroborates the idea of robustness of our proposal, in addition to its usefulness for the processing and analysis of continuous data during uninterrupted periods of time.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

Moving object detection in noisy video sequences using deep convolutional disentangled representations.

Author: García-González Jorge
Luque-Baena Rafael Marcos
López-Rubio Ezequiel
Ortiz-de-lazcano-Lobato Juan Miguel
Publication venue
Publication date: 01/01/2022
Field of study

Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech.Noise robustness is crucial when approaching a moving de- tection problem since image noise is easily mistaken for movement. In order to deal with the noise, deep denoising autoencoders are commonly proposed to be applied on image patches with an inherent disadvantage with respect to the segmentation resolution. In this work, a fully convolutional autoencoder-based moving detection model is proposed in order to deal with noise with no patch extraction required. Different autoencoder structures and training strategies are also tested to get insights into the best network design ap- proach

Repositorio Institucional Universidad de Málaga

Hierarchical feature extraction from spatiotemporal data for cyber-physical system analytics

Author: Akintayo Adedotun John
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

With the advent of ubiquitous sensing, robust communication and advanced computation, data-driven modeling is increasingly becoming popular for many engineering problems. Eliminating diﬃculties of physics-based modeling, avoiding simplifying assumptions and ad hoc empirical models are signiﬁcant among many advantages of data-driven approaches, especially for large-scale complex systems. While classical statistics and signal processing algorithms have been widely used by the engineering community, advanced machine learning techniques have not been suﬃciently explored in this regard. This study summarizes various categories of machine learning tools that have been applied or may be a candidate for addressing engineering problems. While there are increasing number of machine learning algorithms, the main steps involved in applying such techniques to the problems consist in: data collection and pre-processing, feature extraction, model training and inference for decision-making. To support decision-making processes in many applications, hierarchical feature extraction is key. Among various feature extraction principles, recent studies emphasize hierarchical approaches of extracting salient features that is carried out at multiple abstraction levels from data. In this context, the focus of the dissertation is towards developing hierarchical feature extraction algorithms within the framework of machine learning in order to solve challenging cyber-physical problems in various domains such as electromechanical systems and agricultural systems. Furthermore, the feature extraction techniques are described using the spatial, temporal and spatiotemporal data types collected from the systems. The wide applicability of such features in solving some selected real-life domain problems are demonstrated throughout this study

Digital Repository @ Iowa State University (ISU)

Moving Target Detection Based on an Adaptive Low-Rank Sparse Decomposition

Author: Chong Jiang
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 25/03/2021
Field of study

For the exact detection of moving targets in video processing, an adaptive low-rank sparse decomposition algorithm is proposed in this paper. In the paper's algorithm, the background model and the solved frame vector are first used to construct an augmented matrix, then robust principal component analysis (RPCA) is used to perform a low-rank sparse decomposition on the enhanced augmented matrix. The separated low-rank part and sparse noise correspond to the background and motion foreground of the video frame, respectively, the incremental singular value decomposition method and the current background vector are used to update the background model. The experimental results show that the algorithm can deal with complex scenes such as light changes and background motion better, and the algorithm's delay and memory consumption can be reduced effectively

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Author: Garg Animesh
Gilitschenski Igor
Hu Jingyu
Lu Wuyue
Wu Ziyi
Publication venue
Publication date: 18/05/2023
Field of study

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent approaches have made significant progress in unsupervised object discovery. In addition, slot-based representations hold great potential for generative modeling, such as controllable image generation and object manipulation in image editing. However, current slot-based methods often produce blurry images and distorted objects, exhibiting poor generative modeling capabilities. In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation. We introduce SlotDiffusion -- an object-centric Latent Diffusion Model (LDM) designed for both image and video data. Thanks to the powerful modeling capacity of LDMs, SlotDiffusion surpasses previous slot models in unsupervised object segmentation and visual generation across six datasets. Furthermore, our learned object features can be utilized by existing object-centric dynamics models, improving video prediction quality and downstream temporal reasoning tasks. Finally, we demonstrate the scalability of SlotDiffusion to unconstrained real-world datasets such as PASCAL VOC and COCO, when integrated with self-supervised pre-trained image encoders.Comment: Project page: https://slotdiffusion.github.io/ . An earlier version of this work appeared at the ICLR 2023 Workshop on Neurosymbolic Generative Models: https://nesygems.github.io/assets/pdf/papers/SlotDiffusion.pd

arXiv.org e-Print Archive

Conditional generative modeling for images, 3D animations, and video

Author: Voleti Vikram
Publication venue
Publication date: 01/07/2023
Field of study

Generative modeling for computer vision has shown immense progress in the last few years, revolutionizing the way we perceive, understand, and manipulate visual data. This rapidly evolving field has witnessed advancements in image generation, 3D animation, and video prediction that unlock diverse applications across multiple fields including entertainment, design, healthcare, and education. As the demand for sophisticated computer vision systems continues to grow, this dissertation attempts to drive innovation in the field by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation. In all instances, we incorporate conditional information to enhance the synthesis of visual data, improving the efficiency of the generation process as well as the generated content. Prior successful generative techniques which are reversible between noise and data include normalizing flows and denoising diffusion models. The continuous variant of normalizing flows is powered by Neural Ordinary Differential Equations (Neural ODEs), and have shown some success in modeling the real image distribution. However, they often involve huge number of parameters, and high training time. Denoising diffusion models have recently gained huge popularity for their generalization capabilities especially in text-to-image applications. In this dissertation, we introduce the use of Neural ODEs to model video dynamics using an encoder-decoder architecture, demonstrating their ability to predict future video frames despite being trained solely to reconstruct current frames. In our next contribution, we propose a conditional variant of continuous normalizing flows that enables higher-resolution image generation based on lower-resolution input. This allows us to achieve comparable image quality to regular normalizing flows, while significantly reducing the number of parameters and training time. Our next contribution focuses on a flexible encoder-decoder architecture for accurate estimation and editing of full 3D human pose. We present a comprehensive pipeline that takes human images as input, automatically aligns a user-specified 3D human/non-human character with the pose of the human, and facilitates pose editing based on partial input information. We then proceed to use denoising diffusion models for image and video generation. Regular diffusion models involve the use of a Gaussian process to add noise to clean images. In our next contribution, we derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, present non-isotropic noise, and show that the quality of generated images is comparable with the original formulation. In our final contribution, devise a novel framework building on denoising diffusion models that is capable of solving all three video tasks of prediction, generation, and interpolation. We perform ablation studies using this framework, and show state-of-the-art results on multiple datasets. Our contributions are published articles at peer-reviewed venues. Overall, our research aims to make a meaningful contribution to the pursuit of more efficient and flexible generative models, with the potential to shape the future of computer vision.La modélisation générative pour la vision par ordinateur a connu d’immenses progrès ces dernières années, révolutionnant notre façon de percevoir, comprendre et manipuler les données visuelles. Ce domaine en constante évolution a connu des avancées dans la génération d’images, l’animation 3D et la prédiction vidéo, débloquant ainsi diverses applications dans plusieurs domaines tels que le divertissement, le design, la santé et l’éducation. Alors que la demande de systèmes de vision par ordinateur sophistiqués ne cesse de croître, cette thèse s’efforce de stimuler l’innovation dans le domaine en explorant de nouvelles formulations de modèles génératifs conditionnels et des applications innovantes dans les images, les animations 3D et la vidéo. Notre recherche se concentre sur des architectures offrant des transformations réversibles du bruit et des données visuelles, ainsi que sur l’application d’architectures encodeur-décodeur pour les tâches génératives et la manipulation de contenu 3D. Dans tous les cas, nous incorporons des informations conditionnelles pour améliorer la synthèse des données visuelles, améliorant ainsi l’efficacité du processus de génération ainsi que le contenu généré. Les techniques génératives antérieures qui sont réversibles entre le bruit et les données et qui ont connu un certain succès comprennent les flux de normalisation et les modèles de diffusion de débruitage. La variante continue des flux de normalisation est alimentée par les équations différentielles ordinaires neuronales (Neural ODEs) et a montré une certaine réussite dans la modélisation de la distribution d’images réelles. Cependant, elles impliquent souvent un grand nombre de paramètres et un temps d’entraînement élevé. Les modèles de diffusion de débruitage ont récemment gagné énormément en popularité en raison de leurs capacités de généralisation, notamment dans les applications de texte vers image. Dans cette thèse, nous introduisons l’utilisation des Neural ODEs pour modéliser la dynamique vidéo à l’aide d’une architecture encodeur-décodeur, démontrant leur capacité à prédire les images vidéo futures malgré le fait d’être entraînées uniquement à reconstruire les images actuelles. Dans notre prochaine contribution, nous proposons une variante conditionnelle des flux de normalisation continus qui permet une génération d’images à résolution supérieure à partir d’une entrée à résolution inférieure. Cela nous permet d’obtenir une qualité d’image comparable à celle des flux de normalisation réguliers, tout en réduisant considérablement le nombre de paramètres et le temps d’entraînement. Notre prochaine contribution se concentre sur une architecture encodeur-décodeur flexible pour l’estimation et l’édition précises de la pose humaine en 3D. Nous présentons un pipeline complet qui prend des images de personnes en entrée, aligne automatiquement un personnage 3D humain/non humain spécifié par l’utilisateur sur la pose de la personne, et facilite l’édition de la pose en fonction d’informations partielles. Nous utilisons ensuite des modèles de diffusion de débruitage pour la génération d’images et de vidéos. Les modèles de diffusion réguliers impliquent l’utilisation d’un processus gaussien pour ajouter du bruit aux images propres. Dans notre prochaine contribution, nous dérivons les détails mathématiques pertinents pour les modèles de diffusion de débruitage qui utilisent des processus gaussiens non isotropes, présentons du bruit non isotrope, et montrons que la qualité des images générées est comparable à la formulation d’origine. Dans notre dernière contribution, nous concevons un nouveau cadre basé sur les modèles de diffusion de débruitage, capable de résoudre les trois tâches vidéo de prédiction, de génération et d’interpolation. Nous réalisons des études d’ablation en utilisant ce cadre et montrons des résultats de pointe sur plusieurs ensembles de données. Nos contributions sont des articles publiés dans des revues à comité de lecture. Dans l’ensemble, notre recherche vise à apporter une contribution significative à la poursuite de modèles génératifs plus efficaces et flexibles, avec le potentiel de façonner l’avenir de la vision par ordinateur

Dépôt Institutionnel Numérique

Deep Learning of Representations: Looking Forward

Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts. Although the study of deep learning has already led to impressive theoretical results, learning algorithms and breakthrough experiments, several challenges lie ahead. This paper proposes to examine some of these challenges, centering on the questions of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data. It also proposes a few forward-looking research directions aimed at overcoming these challenges

arXiv.org e-Print Archive

Crossref