75 research outputs found

    Foreground detection by probabilistic modeling of the features discovered by stacked denoising autoencoders in noisy video sequences

    Get PDF
    Licencia CC BY-NC-ND. VersiĂłn definitiva disponible en el DOI indicado. GarcĂ­a-GonzĂĄlez, J., Ortiz-de-Lazcano-Lobato, J. M., Luque-Baena, R. M., Molina-Cabello, M. A., & LĂłpez-Rubio, E. (2019). Foreground detection by probabilistic modeling of the features discovered by stacked denoising autoencoders in noisy video sequences. Pattern Recognition Letters, 125, 481-487.A robust foreground detection system is presented, which is resilient to noise in video sequences. The proposed model divides each video frame in patches that are fed to a stacked denoising autoencoder, which is responsible for the extraction of significant features from each image patch. After that, a probabilistic model that is composed of a mixture of Gaussian distributions decides whether the given feature vector describes a patch belonging to the background or the foreground. In order to test the model robustness, several trials with noise of different types and intensities have been carried out. A comparison with other ten state of the art foreground detection algorithms has been drawn. The algorithms have been ranked according to the obtained results, and our proposal appears among the first three positions in most case and its the one that best performs on average.This work is partially supported by the Ministry of Science, Innovation and Universities of Spain [grant number RTI2018-094645-B-I00], project name Automated detection with low cost hardware of unusual activities in video sequences. It is also partially supported by the Autonomous Government of Andalusia (Spain) [grant number TIC-657], project name Self-organizing systems and robust estimators for video surveillance. Both of them include funds from the European Regional Development Fund (ERDF). The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of MĂĄlaga. They have also been supported by the Biomedic Research Institute of MĂĄlaga (IBIMA). They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs

    Background modeling by shifted tilings of stacked denoising autoencoders

    Get PDF
    The effective processing of visual data without interruption is currently of supreme importance. For that purpose, the analysis system must adapt to events that may affect the data quality and maintain its performance level over time. A methodology for background modeling and foreground detection, whose main characteristic is its robustness against stationary noise, is presented in the paper. The system is based on a stacked denoising autoencoder which extracts a set of significant features for each patch of several shifted tilings of the video frame. A probabilistic model for each patch is learned. The distinct patches which include a particular pixel are considered for that pixel classification. The experiments show that classical methods existing in the literature experience drastic performance drops when noise is present in the video sequences, whereas the proposed one seems to be slightly affected. This fact corroborates the idea of robustness of our proposal, in addition to its usefulness for the processing and analysis of continuous data during uninterrupted periods of time.Universidad de MĂĄlaga. Campus de Excelencia Internacional AndalucĂ­a Tech

    Moving object detection in noisy video sequences using deep convolutional disentangled representations.

    Get PDF
    Universidad de MĂĄlaga. Campus de Excelencia Internacional AndalucĂ­a Tech.Noise robustness is crucial when approaching a moving de- tection problem since image noise is easily mistaken for movement. In order to deal with the noise, deep denoising autoencoders are commonly proposed to be applied on image patches with an inherent disadvantage with respect to the segmentation resolution. In this work, a fully convolutional autoencoder-based moving detection model is proposed in order to deal with noise with no patch extraction required. Different autoencoder structures and training strategies are also tested to get insights into the best network design ap- proach

    Hierarchical feature extraction from spatiotemporal data for cyber-physical system analytics

    Get PDF
    With the advent of ubiquitous sensing, robust communication and advanced computation, data-driven modeling is increasingly becoming popular for many engineering problems. Eliminating diïŹƒculties of physics-based modeling, avoiding simplifying assumptions and ad hoc empirical models are signiïŹcant among many advantages of data-driven approaches, especially for large-scale complex systems. While classical statistics and signal processing algorithms have been widely used by the engineering community, advanced machine learning techniques have not been suïŹƒciently explored in this regard. This study summarizes various categories of machine learning tools that have been applied or may be a candidate for addressing engineering problems. While there are increasing number of machine learning algorithms, the main steps involved in applying such techniques to the problems consist in: data collection and pre-processing, feature extraction, model training and inference for decision-making. To support decision-making processes in many applications, hierarchical feature extraction is key. Among various feature extraction principles, recent studies emphasize hierarchical approaches of extracting salient features that is carried out at multiple abstraction levels from data. In this context, the focus of the dissertation is towards developing hierarchical feature extraction algorithms within the framework of machine learning in order to solve challenging cyber-physical problems in various domains such as electromechanical systems and agricultural systems. Furthermore, the feature extraction techniques are described using the spatial, temporal and spatiotemporal data types collected from the systems. The wide applicability of such features in solving some selected real-life domain problems are demonstrated throughout this study

    Moving Target Detection Based on an Adaptive Low-Rank Sparse Decomposition

    Get PDF
    For the exact detection of moving targets in video processing, an adaptive low-rank sparse decomposition algorithm is proposed in this paper. In the paper's algorithm, the background model and the solved frame vector are first used to construct an augmented matrix, then robust principal component analysis (RPCA) is used to perform a low-rank sparse decomposition on the enhanced augmented matrix. The separated low-rank part and sparse noise correspond to the background and motion foreground of the video frame, respectively, the incremental singular value decomposition method and the current background vector are used to update the background model. The experimental results show that the algorithm can deal with complex scenes such as light changes and background motion better, and the algorithm's delay and memory consumption can be reduced effectively

    SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

    Full text link
    Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent approaches have made significant progress in unsupervised object discovery. In addition, slot-based representations hold great potential for generative modeling, such as controllable image generation and object manipulation in image editing. However, current slot-based methods often produce blurry images and distorted objects, exhibiting poor generative modeling capabilities. In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation. We introduce SlotDiffusion -- an object-centric Latent Diffusion Model (LDM) designed for both image and video data. Thanks to the powerful modeling capacity of LDMs, SlotDiffusion surpasses previous slot models in unsupervised object segmentation and visual generation across six datasets. Furthermore, our learned object features can be utilized by existing object-centric dynamics models, improving video prediction quality and downstream temporal reasoning tasks. Finally, we demonstrate the scalability of SlotDiffusion to unconstrained real-world datasets such as PASCAL VOC and COCO, when integrated with self-supervised pre-trained image encoders.Comment: Project page: https://slotdiffusion.github.io/ . An earlier version of this work appeared at the ICLR 2023 Workshop on Neurosymbolic Generative Models: https://nesygems.github.io/assets/pdf/papers/SlotDiffusion.pd

    Conditional generative modeling for images, 3D animations, and video

    Full text link
    Generative modeling for computer vision has shown immense progress in the last few years, revolutionizing the way we perceive, understand, and manipulate visual data. This rapidly evolving field has witnessed advancements in image generation, 3D animation, and video prediction that unlock diverse applications across multiple fields including entertainment, design, healthcare, and education. As the demand for sophisticated computer vision systems continues to grow, this dissertation attempts to drive innovation in the field by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation. In all instances, we incorporate conditional information to enhance the synthesis of visual data, improving the efficiency of the generation process as well as the generated content. Prior successful generative techniques which are reversible between noise and data include normalizing flows and denoising diffusion models. The continuous variant of normalizing flows is powered by Neural Ordinary Differential Equations (Neural ODEs), and have shown some success in modeling the real image distribution. However, they often involve huge number of parameters, and high training time. Denoising diffusion models have recently gained huge popularity for their generalization capabilities especially in text-to-image applications. In this dissertation, we introduce the use of Neural ODEs to model video dynamics using an encoder-decoder architecture, demonstrating their ability to predict future video frames despite being trained solely to reconstruct current frames. In our next contribution, we propose a conditional variant of continuous normalizing flows that enables higher-resolution image generation based on lower-resolution input. This allows us to achieve comparable image quality to regular normalizing flows, while significantly reducing the number of parameters and training time. Our next contribution focuses on a flexible encoder-decoder architecture for accurate estimation and editing of full 3D human pose. We present a comprehensive pipeline that takes human images as input, automatically aligns a user-specified 3D human/non-human character with the pose of the human, and facilitates pose editing based on partial input information. We then proceed to use denoising diffusion models for image and video generation. Regular diffusion models involve the use of a Gaussian process to add noise to clean images. In our next contribution, we derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, present non-isotropic noise, and show that the quality of generated images is comparable with the original formulation. In our final contribution, devise a novel framework building on denoising diffusion models that is capable of solving all three video tasks of prediction, generation, and interpolation. We perform ablation studies using this framework, and show state-of-the-art results on multiple datasets. Our contributions are published articles at peer-reviewed venues. Overall, our research aims to make a meaningful contribution to the pursuit of more efficient and flexible generative models, with the potential to shape the future of computer vision.La modĂ©lisation gĂ©nĂ©rative pour la vision par ordinateur a connu d’immenses progrĂšs ces derniĂšres annĂ©es, rĂ©volutionnant notre façon de percevoir, comprendre et manipuler les donnĂ©es visuelles. Ce domaine en constante Ă©volution a connu des avancĂ©es dans la gĂ©nĂ©ration d’images, l’animation 3D et la prĂ©diction vidĂ©o, dĂ©bloquant ainsi diverses applications dans plusieurs domaines tels que le divertissement, le design, la santĂ© et l’éducation. Alors que la demande de systĂšmes de vision par ordinateur sophistiquĂ©s ne cesse de croĂźtre, cette thĂšse s’efforce de stimuler l’innovation dans le domaine en explorant de nouvelles formulations de modĂšles gĂ©nĂ©ratifs conditionnels et des applications innovantes dans les images, les animations 3D et la vidĂ©o. Notre recherche se concentre sur des architectures offrant des transformations rĂ©versibles du bruit et des donnĂ©es visuelles, ainsi que sur l’application d’architectures encodeur-dĂ©codeur pour les tĂąches gĂ©nĂ©ratives et la manipulation de contenu 3D. Dans tous les cas, nous incorporons des informations conditionnelles pour amĂ©liorer la synthĂšse des donnĂ©es visuelles, amĂ©liorant ainsi l’efficacitĂ© du processus de gĂ©nĂ©ration ainsi que le contenu gĂ©nĂ©rĂ©. Les techniques gĂ©nĂ©ratives antĂ©rieures qui sont rĂ©versibles entre le bruit et les donnĂ©es et qui ont connu un certain succĂšs comprennent les flux de normalisation et les modĂšles de diffusion de dĂ©bruitage. La variante continue des flux de normalisation est alimentĂ©e par les Ă©quations diffĂ©rentielles ordinaires neuronales (Neural ODEs) et a montrĂ© une certaine rĂ©ussite dans la modĂ©lisation de la distribution d’images rĂ©elles. Cependant, elles impliquent souvent un grand nombre de paramĂštres et un temps d’entraĂźnement Ă©levĂ©. Les modĂšles de diffusion de dĂ©bruitage ont rĂ©cemment gagnĂ© Ă©normĂ©ment en popularitĂ© en raison de leurs capacitĂ©s de gĂ©nĂ©ralisation, notamment dans les applications de texte vers image. Dans cette thĂšse, nous introduisons l’utilisation des Neural ODEs pour modĂ©liser la dynamique vidĂ©o Ă  l’aide d’une architecture encodeur-dĂ©codeur, dĂ©montrant leur capacitĂ© Ă  prĂ©dire les images vidĂ©o futures malgrĂ© le fait d’ĂȘtre entraĂźnĂ©es uniquement Ă  reconstruire les images actuelles. Dans notre prochaine contribution, nous proposons une variante conditionnelle des flux de normalisation continus qui permet une gĂ©nĂ©ration d’images Ă  rĂ©solution supĂ©rieure Ă  partir d’une entrĂ©e Ă  rĂ©solution infĂ©rieure. Cela nous permet d’obtenir une qualitĂ© d’image comparable Ă  celle des flux de normalisation rĂ©guliers, tout en rĂ©duisant considĂ©rablement le nombre de paramĂštres et le temps d’entraĂźnement. Notre prochaine contribution se concentre sur une architecture encodeur-dĂ©codeur flexible pour l’estimation et l’édition prĂ©cises de la pose humaine en 3D. Nous prĂ©sentons un pipeline complet qui prend des images de personnes en entrĂ©e, aligne automatiquement un personnage 3D humain/non humain spĂ©cifiĂ© par l’utilisateur sur la pose de la personne, et facilite l’édition de la pose en fonction d’informations partielles. Nous utilisons ensuite des modĂšles de diffusion de dĂ©bruitage pour la gĂ©nĂ©ration d’images et de vidĂ©os. Les modĂšles de diffusion rĂ©guliers impliquent l’utilisation d’un processus gaussien pour ajouter du bruit aux images propres. Dans notre prochaine contribution, nous dĂ©rivons les dĂ©tails mathĂ©matiques pertinents pour les modĂšles de diffusion de dĂ©bruitage qui utilisent des processus gaussiens non isotropes, prĂ©sentons du bruit non isotrope, et montrons que la qualitĂ© des images gĂ©nĂ©rĂ©es est comparable Ă  la formulation d’origine. Dans notre derniĂšre contribution, nous concevons un nouveau cadre basĂ© sur les modĂšles de diffusion de dĂ©bruitage, capable de rĂ©soudre les trois tĂąches vidĂ©o de prĂ©diction, de gĂ©nĂ©ration et d’interpolation. Nous rĂ©alisons des Ă©tudes d’ablation en utilisant ce cadre et montrons des rĂ©sultats de pointe sur plusieurs ensembles de donnĂ©es. Nos contributions sont des articles publiĂ©s dans des revues Ă  comitĂ© de lecture. Dans l’ensemble, notre recherche vise Ă  apporter une contribution significative Ă  la poursuite de modĂšles gĂ©nĂ©ratifs plus efficaces et flexibles, avec le potentiel de façonner l’avenir de la vision par ordinateur

    Deep Learning of Representations: Looking Forward

    Full text link
    Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts. Although the study of deep learning has already led to impressive theoretical results, learning algorithms and breakthrough experiments, several challenges lie ahead. This paper proposes to examine some of these challenges, centering on the questions of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data. It also proposes a few forward-looking research directions aimed at overcoming these challenges
    • 

    corecore