13 research outputs found

    PredNet and Predictive Coding: A Critical Review

    Full text link
    PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory

    A Review on Deep Learning Techniques for Video Prediction

    Get PDF
    The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions.This work has been funded by the Spanish Government PID2019-104818RB-I00 grant for the MoDeaAS project, supported with Feder funds. This work has also been supported by two Spanish national grants for PhD studies, FPU17/00166, and ACIF/2018/197 respectively

    Brain-Inspired Computational Intelligence via Predictive Coding

    Full text link
    Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.Comment: 37 Pages, 9 Figure

    A Comprehensive survey on deep future frame video prediction

    Get PDF
    El present projecte planteja l'estudi comprensiu i extens per a la tasca de predicció de fotogrames donada una seqüència de vídeo. Mitjançant l'anàlisi de l'estat de l'art en generació d'imatges, xarxes convolucionals i adversàries l'objectiu és establir les forces i utilitats d'aquesta tasca

    Feature Papers of Forecasting

    Get PDF
    Nowadays, forecast applications are receiving unprecedent attention thanks to their capability to improve the decision-making processes by providing useful indications. A large number of forecast approaches related to different forecast horizons and to the specific problem that have to be predicted have been proposed in recent scientific literature, from physical models to data-driven statistic and machine learning approaches. In this Special Issue, the most recent and high-quality researches about forecast are collected. A total of nine papers have been selected to represent a wide range of applications, from weather and environmental predictions to economic and management forecasts. Finally, some applications related to the forecasting of the different phases of COVID in Spain and the photovoltaic power production have been presented

    Feature Papers of Forecasting

    Get PDF

    Advances in generative models for dynamic scenes

    Full text link
    Les réseaux de neurones sont un type de modèle d'apprentissage automatique (ML) qui résolvent des tâches complexes d'intelligence artificielle (AI) sans nécessiter de représentations de données élaborées manuellement. Bien qu'ils aient obtenu des résultats impressionnants dans des tâches nécessitant un traitement de la parole, d’image, et du langage, les réseaux de neurones ont encore de la difficulté à résoudre des tâches de compréhension de scènes dynamiques. De plus, l’entraînement de réseaux de neurones nécessite généralement de nombreuses données annotées manuellement, ce qui peut être un processus long et coûteux. Cette thèse est composée de quatre articles proposant des modèles génératifs pour des scènes dynamiques. La modélisation générative est un domaine du ML qui étudie comment apprendre les mécanismes par lesquels les données sont produites. La principale motivation derrière les modèles génératifs est de pouvoir, sans utiliser d’étiquettes, apprendre des représentations de données utiles; c’est un sous-produit de l'approximation du processus de génération de données. De plus, les modèles génératifs sont utiles pour un large éventail d'applications telles que la super-résolution d'images, la synthèse vocale ou le résumé de texte. Le premier article se concentre sur l'amélioration de la performance des précédents auto-encodeurs variationnels (VAE) pour la prédiction vidéo. Il s’agit d’une tâche qui consiste à générer les images futures d'une scène dynamique, compte tenu de certaines observations antérieures. Les VAE sont une famille de modèles à variables latentes qui peuvent être utilisés pour échantillonner des points de données. Comparés à d'autres modèles génératifs, les VAE sont faciles à entraîner et ont tendance à couvrir tous les modes des données, mais produisent souvent des résultats de moindre qualité. En prédiction vidéo, les VAE ont été les premiers modèles capables de produire des images futures plausibles à partir d’un contexte donné, un progrès marquant par rapport aux modèles précédents car, pour la plupart des scènes dynamiques, le futur n'est pas une fonction déterministe du passé. Cependant, les premiers VAE pour la prédiction vidéo produisaient des résultats avec des artefacts visuels visibles et ne fonctionnaient pas sur des ensembles de données réalistes complexes. Dans cet article, nous identifions certains des facteurs limitants de ces modèles, et nous proposons pour chacun d’eux une solution pour en atténuer l'impact. Grâce à ces modifications, nous montrons que les VAE pour la prédiction vidéo peuvent obtenir des résultats de qualité nettement supérieurs par rapport aux références précédentes, et qu'ils peuvent être utilisés pour modéliser des scènes de conduite autonome. Dans le deuxième article, nous proposons un nouveau modèle en cascade pour la génération vidéo basé sur les réseaux antagonistes génératifs (GAN). Après le succès des VAE pour prédiction vidéo, il a été démontré que les GAN produisaient des échantillons vidéo de meilleure qualité pour la génération vidéo conditionnelle à des classes. Cependant, les GAN nécessitent de très grandes tailles de lots ainsi que des modèles de grande capacité, ce qui rend l’entraînement des GAN pour la génération vidéo coûteux computationnellement, à la fois en termes de mémoire et en temps de calcul. Nous proposons de scinder le processus génératif en une cascade de sous-modèles, chacun d'eux résolvant un problème plus simple. Cette division nous permet de réduire considérablement le coût computationnel tout en conservant la qualité de l'échantillon, et nous démontrons que ce modèle peut s'adapter à de très grands ensembles de données ainsi qu’à des vidéos de haute résolution. Dans le troisième article, nous concevons un modèle basé sur le principe qu'une scène est composée de différents objets, mais que les transitions de trame (également appelées règles dynamiques) sont partagées entre les objets. Pour mettre en œuvre cette hypothèse de modélisation, nous concevons un modèle qui extrait d'abord les différentes entités d'une image. Ensuite, le modèle apprend à mettre à jour la représentation de l'objet d'une image à l'autre en choisissant parmi différentes transitions possibles qui sont toutes partagées entre les différents objets. Nous montrons que, lors de l'apprentissage d'un tel modèle, les règles de transition sont fondées sémantiquement, et peuvent être appliquées à des objets non vus lors de l'apprentissage. De plus, nous pouvons utiliser ce modèle pour prédire les observations multimodales futures d'une scène dynamique en choisissant différentes transitions. Dans le dernier article nous proposons un modèle génératif basé sur des techniques de rendu 3D qui permet de générer des scènes avec plusieurs objets. Nous concevons un mécanisme d'inférence pour apprendre les représentations qui peuvent être rendues avec notre modèle et nous optimisons simultanément ce mécanisme d'inférence et le moteur de rendu. Nous montrons que ce modèle possède une représentation interprétable dans laquelle des changements sémantiques appliqués à la représentation de la scène sont rendus dans la scène générée. De plus, nous montrons que, suite au processus d’entraînement, notre modèle apprend à segmenter les objets dans une scène sans annotations et que la représentation apprise peut être utilisée pour résoudre des tâches de compréhension de scène dynamique en déduisant la représentation de chaque observation.Neural networks are a type of Machine Learning (ML) models that solve complex Artificial Intelligence (AI) tasks without requiring handcrafted data representations. Although they have achieved impressive results in tasks requiring speech, image and language processing, neural networks still struggle to solve dynamic scene understanding tasks. Furthermore, training neural networks usually demands lots data that is annotated manually, which can be an expensive and time-consuming process. This thesis is comprised of four articles proposing generative models for dynamic scenes. Generative modelling is an area of ML that investigates how to learn the mechanisms by which data is produced. The main motivation for generative models is to learn useful data representations without labels as a by-product of approximating the data generation process. Furthermore, generative models are useful for a wide range of applications such as image super-resolution, voice synthesis or text summarization. The first article focuses on improving the performance of previous Variational AutoEncoders (VAEs) for video prediction, which is the task of generating future frames of a dynamic scene given some previous occurred observations. VAEs are a family of latent variable models that can be used to sample data points. Compared to other generative models, VAEs are easy to train and tend to cover all data modes, but often produce lower quality results. In video prediction VAEs were the first models that were able to produce multiple plausible future outcomes given a context, marking an advancement over previous models as for most dynamic scenes the future is not a deterministic function of the past. However, the first VAEs for video prediction produced results with visible visual artifacts and could not operate on complex realistic datasets. In this article we identify some of the limiting factors for these models, and for each of them we propose a solution to ease its impact. With our proposed modifications, we show that VAEs for video prediction can obtain significant higher quality results over previous baselines and that they can be used to model autonomous driving scenes. In the second article we propose a new cascaded model for video generation based on Generative Adversarial Networks (GANs). After the success of VAEs in video prediction, GANs were shown to produce higher quality video samples for class-conditional video generation. However, GANs require very large batch sizes and high capacity models, which makes training GANs for video generation computationally expensive, both in terms of memory and training time. We propose to split the generative process into a cascade of submodels, each of them solving a smaller generative problem. This split allows us to significantly reduce the computational requirements while retaining sample quality, and we show that this model can scale to very large datasets and video resolutions. In the third article we design a model based on the premise that a scene is comprised of different objects but that frame transitions (also known as dynamic rules) are shared among objects. To implement this modeling assumption we design a model that first extracts the different entities in a frame, and then learns to update the object representation from one frame to another by choosing among different possible transitions, all shared among objects. We show that, when learning such a model, the transition rules are semantically grounded and can be applied to objects not seen during training. Further, we can use this model for predicting multimodal future observations of a dynamic scene by choosing different transitions. In the last article we propose a generative model based on 3D rendering techniques that can generate scenes with multiple objects. We design an inference mechanism to learn representations that can be rendered with our model and we simultaneously optimize this inference mechanism and the renderer. We show that this model has an interpretable representation in which semantic changes to the scene representation are shown in the output. Furthermore, we show that, as a by product of the training process, our model learns to segment the objects in a scene without annotations and that the learned representation can be used to solve dynamic scene understanding tasks by inferring the representation of each observation

    Basic statistical estimation outperforms machine learning in monthly prediction of seasonal climatic parameters

    Get PDF
    Machine learning (ML) has been utilized to predict climatic parameters, and many successes have been reported in the literature. In this paper, we scrutinize the effectiveness of five widely used ML algorithms in the monthly prediction of seasonal climatic parameters using monthly image data. Specifically, we quantify the predictive performance of these algorithms applied to five climatic parameters using various combinations of features. We compare the predictive accuracy of the resulting trained ML models to that of basic statistical estimators that are computed directly from the training data. Our results show that ML never significantly outperforms the statistical baseline, and underperforms for most feature sets. Unlike previous similar studies, we provide error bars for the relative performance of different predictors based on jackknife estimates applied to differences in predictive error magnitudes. We also show that the practice of shuffling data sequences which was employed in some previous references leads to data leakage, resulting in over-estimated performance. Ultimately, the paper demonstrates the importance of using well-grounded statistical techniques when producing and analyzing the results of ML predictive models

    Geo-physical parameter forecasting on imagery{based data sets using machine learning techniques

    Get PDF
    >Magister Scientiae - MScThis research objectively investigates the e ectiveness of machine learning (ML) tools towards predicting several geo-physical parameters. This is based on a large number of studies that have reported high levels of prediction success using ML in the eld. Therefore, several widely used ML tools coupled with a number of di erent feature sets are used to predict six geophysical parameters namely rainfall, groundwater, evapora- tion, humidity, temperature, and wind. The results of the research indicate that: a) a large number of related studies in the eld are prone to speci c pitfalls that lead to over-estimated results in favour of ML tools; b) the use of gaussian mixture models as global features can provide a higher accuracy compared to other local feature sets; c) ML never outperform simple statistically-based estimators on highly-seasonal parame- ters, and providing error bars is key to objectively evaluating the relative performance of the ML tools used; and d) ML tools can be e ective for parameters that are slow- changing such as groundwater
    corecore