328 research outputs found

    Abnormal Event Detection in Videos using Spatiotemporal Autoencoder

    Full text link
    We present an efficient method for detecting anomalies in videos. Recent applications of convolutional neural networks have shown promises of convolutional layers for object detection and recognition, especially in images. However, convolutional neural networks are supervised and require labels as learning signals. We propose a spatiotemporal architecture for anomaly detection in videos including crowded scenes. Our architecture includes two main components, one for spatial feature representation, and one for learning the temporal evolution of the spatial features. Experimental results on Avenue, Subway and UCSD benchmarks confirm that the detection accuracy of our method is comparable to state-of-the-art methods at a considerable speed of up to 140 fps

    Intelligent calibration of static FEA computations based on terrestrial laser scanning reference

    Get PDF
    The demand for efficient and accurate finite element analysis (FEA) is becoming more prevalent with the increase in advanced calibration technologies and sensor-based monitoring methods. The current research explores a deep learning-based methodology to calibrate FEA results. The utilization of monitoring reference results from measurements, e.g., terrestrial laser scanning, can help to capture the actual features in the static loading process. We learn the deviation sequence results between the standard FEA computations with the simplified geometry and refined reference values by the long short-term memory method. The complex changing principles in different deviations are trained and captured effectively in the training process of deep learning. Hence, we generate the FEA sequence results corresponding to next adjacent loading steps. The final FEA computations are calibrated by the threshold control. The calibration reduces the mean square errors of the FEA future sequence results significantly. This strengthens the calibration depth. Consequently, the calibration of FEA computations with deep learning can play a helpful role in the prediction and monitoring problems regarding the future structural behaviors. © 2020 by the authors. Licensee MDPI, Basel, Switzerland

    Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

    Full text link
    We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN

    PredNet and Predictive Coding: A Critical Review

    Full text link
    PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory

    Revisiting the Temporal Modeling in Spatio-Temporal Predictive Learning under A Unified View

    Full text link
    Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-free methods. The former, while meticulously processing frames one by one, neglect short-term spatio-temporal information redundancies, leading to inefficiencies. The latter naively stack frames sequentially, overlooking the inherent temporal dependencies. In this paper, we re-examine the two dominant temporal modeling approaches within the realm of spatio-temporal predictive learning, offering a unified perspective. Building upon this analysis, we introduce USTEP (Unified Spatio-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales. Extensive experiments on a wide range of spatio-temporal predictive learning demonstrate that USTEP achieves significant improvements over existing temporal modeling approaches, thereby establishing it as a robust solution for a wide range of spatio-temporal applications.Comment: Under revie

    Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks

    Get PDF
    Traditional models of retinal system identification analyze the neural response to artificial stimuli using models consisting of predefined components. The model design is limited to prior knowledge, and the artificial stimuli are too simple to be compared with stimuli processed by the retina. To fill in this gap with an explainable model that reveals how a population of neurons work together to encode the larger field of natural scenes, here we used a deep-learning model for identifying the computational elements of the retinal circuit that contribute to learning the dynamics of natural scenes. Experimental results verify that the recurrent connection plays a key role in encoding complex dynamic visual scenes while learning biological computational underpinnings of the retinal circuit. In addition, the proposed models reveal both the shapes and the locations of the spatiotemporal receptive fields of ganglion cells
    • …
    corecore