328 research outputs found
Abnormal Event Detection in Videos using Spatiotemporal Autoencoder
We present an efficient method for detecting anomalies in videos. Recent
applications of convolutional neural networks have shown promises of
convolutional layers for object detection and recognition, especially in
images. However, convolutional neural networks are supervised and require
labels as learning signals. We propose a spatiotemporal architecture for
anomaly detection in videos including crowded scenes. Our architecture includes
two main components, one for spatial feature representation, and one for
learning the temporal evolution of the spatial features. Experimental results
on Avenue, Subway and UCSD benchmarks confirm that the detection accuracy of
our method is comparable to state-of-the-art methods at a considerable speed of
up to 140 fps
Intelligent calibration of static FEA computations based on terrestrial laser scanning reference
The demand for efficient and accurate finite element analysis (FEA) is becoming more prevalent with the increase in advanced calibration technologies and sensor-based monitoring methods. The current research explores a deep learning-based methodology to calibrate FEA results. The utilization of monitoring reference results from measurements, e.g., terrestrial laser scanning, can help to capture the actual features in the static loading process. We learn the deviation sequence results between the standard FEA computations with the simplified geometry and refined reference values by the long short-term memory method. The complex changing principles in different deviations are trained and captured effectively in the training process of deep learning. Hence, we generate the FEA sequence results corresponding to next adjacent loading steps. The final FEA computations are calibrated by the threshold control. The calibration reduces the mean square errors of the FEA future sequence results significantly. This strengthens the calibration depth. Consequently, the calibration of FEA computations with deep learning can play a helpful role in the prediction and monitoring problems regarding the future structural behaviors. © 2020 by the authors. Licensee MDPI, Basel, Switzerland
Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction
We present a multi-scale predictive coding model for future video frames
prediction. Drawing inspiration on the ``Predictive Coding" theories in
cognitive science, it is updated by a combination of bottom-up and top-down
information flows, which can enhance the interaction between different network
levels. However, traditional predictive coding models only predict what is
happening hierarchically rather than predicting the future. To address the
problem, our model employs a multi-scale approach (Coarse to Fine), where the
higher level neurons generate coarser predictions (lower resolution), while the
lower level generate finer predictions (higher resolution). In terms of network
architecture, we directly incorporate the encoder-decoder network within the
LSTM module and share the final encoded high-level semantic information across
different network levels. This enables comprehensive interaction between the
current input and the historical states of LSTM compared with the traditional
Encoder-LSTM-Decoder architecture, thus learning more believable temporal and
spatial dependencies. Furthermore, to tackle the instability in adversarial
training and mitigate the accumulation of prediction errors in long-term
prediction, we propose several improvements to the training strategy. Our
approach achieves good performance on datasets such as KTH, Moving MNIST and
Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN
PredNet and Predictive Coding: A Critical Review
PredNet, a deep predictive coding network developed by Lotter et al.,
combines a biologically inspired architecture based on the propagation of
prediction error with self-supervised representation learning in video. While
the architecture has drawn a lot of attention and various extensions of the
model exist, there is a lack of a critical analysis. We fill in the gap by
evaluating PredNet both as an implementation of the predictive coding theory
and as a self-supervised video prediction model using a challenging video
action classification dataset. We design an extended model to test if
conditioning future frame predictions on the action class of the video improves
the model performance. We show that PredNet does not yet completely follow the
principles of predictive coding. The proposed top-down conditioning leads to a
performance gain on synthetic data, but does not scale up to the more complex
real-world action classification dataset. Our analysis is aimed at guiding
future research on similar architectures based on the predictive coding theory
Revisiting the Temporal Modeling in Spatio-Temporal Predictive Learning under A Unified View
Spatio-temporal predictive learning plays a crucial role in self-supervised
learning, with wide-ranging applications across a diverse range of fields.
Previous approaches for temporal modeling fall into two categories:
recurrent-based and recurrent-free methods. The former, while meticulously
processing frames one by one, neglect short-term spatio-temporal information
redundancies, leading to inefficiencies. The latter naively stack frames
sequentially, overlooking the inherent temporal dependencies. In this paper, we
re-examine the two dominant temporal modeling approaches within the realm of
spatio-temporal predictive learning, offering a unified perspective. Building
upon this analysis, we introduce USTEP (Unified Spatio-TEmporal Predictive
learning), an innovative framework that reconciles the recurrent-based and
recurrent-free methods by integrating both micro-temporal and macro-temporal
scales. Extensive experiments on a wide range of spatio-temporal predictive
learning demonstrate that USTEP achieves significant improvements over existing
temporal modeling approaches, thereby establishing it as a robust solution for
a wide range of spatio-temporal applications.Comment: Under revie
Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks
Traditional models of retinal system identification analyze the neural response to artificial stimuli using models consisting of predefined components. The model design is limited to prior knowledge, and the artificial stimuli are too simple to be compared with stimuli processed by the retina. To fill in this gap with an explainable model that reveals how a population of neurons work together to encode the larger field of natural scenes, here we used a deep-learning model for identifying the computational elements of the retinal circuit that contribute to learning the dynamics of natural scenes. Experimental results verify that the recurrent connection plays a key role in encoding complex dynamic visual scenes while learning biological computational underpinnings of the retinal circuit. In addition, the proposed models reveal both the shapes and the locations of the spatiotemporal receptive fields of ganglion cells
- …