72,835 research outputs found
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction
Pedestrian path prediction is an essential topic in computer vision and video
understanding. Having insight into the movement of pedestrians is crucial for
ensuring safe operation in a variety of applications including autonomous
vehicles, social robots, and environmental monitoring. Current works in this
area utilize complex generative or recurrent methods to capture many possible
futures. However, despite the inherent real-time nature of predicting future
paths, little work has been done to explore accurate and computationally
efficient approaches for this task. To this end, we propose a convolutional
approach for real-time pedestrian path prediction, CARPe. It utilizes a
variation of Graph Isomorphism Networks in combination with an agile
convolutional neural network design to form a fast and accurate path prediction
approach. Notable results in both inference speed and prediction accuracy are
achieved, improving FPS considerably in comparison to current state-of-the-art
methods while delivering competitive accuracy on well-known path prediction
datasets.Comment: AAAI-21 Camera Read
Precipitation nowcasting with generative diffusion models
In recent years traditional numerical methods for accurate weather prediction
have been increasingly challenged by deep learning methods. Numerous historical
datasets used for short and medium-range weather forecasts are typically
organized into a regular spatial grid structure. This arrangement closely
resembles images: each weather variable can be visualized as a map or, when
considering the temporal axis, as a video. Several classes of generative
models, comprising Generative Adversarial Networks, Variational Autoencoders,
or the recent Denoising Diffusion Models have largely proved their
applicability to the next-frame prediction problem, and is thus natural to test
their performance on the weather prediction benchmarks. Diffusion models are
particularly appealing in this context, due to the intrinsically probabilistic
nature of weather forecasting: what we are really interested to model is the
probability distribution of weather indicators, whose expected value is the
most likely prediction.
In our study, we focus on a specific subset of the ERA-5 dataset, which
includes hourly data pertaining to Central Europe from the years 2016 to 2021.
Within this context, we examine the efficacy of diffusion models in handling
the task of precipitation nowcasting. Our work is conducted in comparison to
the performance of well-established U-Net models, as documented in the existing
literature. Our proposed approach of Generative Ensemble Diffusion (GED)
utilizes a diffusion model to generate a set of possible weather scenarios
which are then amalgamated into a probable prediction via the use of a
post-processing network. This approach, in comparison to recent deep learning
models, substantially outperformed them in terms of overall performance.Comment: 21 pages, 6 figure
Action perception as hypothesis testing
We present a novel computational model that describes action perception as an active inferential process that combines motor prediction (the reuse of our own motor system to predict perceived movements) and hypothesis testing (the use of eye movements to disambiguate amongst hypotheses). The system uses a generative model of how (arm and hand) actions are performed to generate hypothesis-specific visual predictions, and directs saccades to the most informative places of the visual scene to test these predictions – and underlying hypotheses. We test the model using eye movement data from a human action observation study. In both the human study and our model, saccades are proactive whenever context affords accurate action prediction; but uncertainty induces a more reactive gaze strategy, via tracking the observed movements. Our model offers a novel perspective on action observation that highlights its active nature based on prediction dynamics and hypothesis testing
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Expressive human speech generally abounds with rich and flexible speech
prosody variations. The speech prosody predictors in existing expressive speech
synthesis methods mostly produce deterministic predictions, which are learned
by directly minimizing the norm of prosody prediction error. Its unimodal
nature leads to a mismatch with ground truth distribution and harms the model's
ability in making diverse predictions. Thus, we propose a novel prosody
predictor based on the denoising diffusion probabilistic model to take
advantage of its high-quality generative modeling and training stability.
Experiment results confirm that the proposed prosody predictor outperforms the
deterministic baseline on both the expressiveness and diversity of prediction
results with even fewer network parameters.Comment: Proceedings of Interspeech 2023 (doi: 10.21437/Interspeech.2023-715),
demo site at https://thuhcsi.github.io/interspeech2023-DiffVar
- …