4,539 research outputs found
Multi-Task Spatiotemporal Neural Networks for Structured Surface Reconstruction
Deep learning methods have surpassed the performance of traditional
techniques on a wide range of problems in computer vision, but nearly all of
this work has studied consumer photos, where precisely correct output is often
not critical. It is less clear how well these techniques may apply on
structured prediction problems where fine-grained output with high precision is
required, such as in scientific imaging domains. Here we consider the problem
of segmenting echogram radar data collected from the polar ice sheets, which is
challenging because segmentation boundaries are often very weak and there is a
high degree of noise. We propose a multi-task spatiotemporal neural network
that combines 3D ConvNets and Recurrent Neural Networks (RNNs) to estimate ice
surface boundaries from sequences of tomographic radar images. We show that our
model outperforms the state-of-the-art on this problem by (1) avoiding the need
for hand-tuned parameters, (2) extracting multiple surfaces (ice-air and
ice-bed) simultaneously, (3) requiring less non-visual metadata, and (4) being
about 6 times faster.Comment: 10 pages, 7 figures, published in WACV 201
Dynamic Decomposition of Spatiotemporal Neural Signals
Neural signals are characterized by rich temporal and spatiotemporal dynamics
that reflect the organization of cortical networks. Theoretical research has
shown how neural networks can operate at different dynamic ranges that
correspond to specific types of information processing. Here we present a data
analysis framework that uses a linearized model of these dynamic states in
order to decompose the measured neural signal into a series of components that
capture both rhythmic and non-rhythmic neural activity. The method is based on
stochastic differential equations and Gaussian process regression. Through
computer simulations and analysis of magnetoencephalographic data, we
demonstrate the efficacy of the method in identifying meaningful modulations of
oscillatory signals corrupted by structured temporal and spatiotemporal noise.
These results suggest that the method is particularly suitable for the analysis
and interpretation of complex temporal and spatiotemporal neural signals
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Physics-Informed Computer Vision: A Review and Perspectives
Incorporation of physical information in machine learning frameworks are
opening and transforming many application domains. Here the learning process is
augmented through the induction of fundamental knowledge and governing physical
laws. In this work we explore their utility for computer vision tasks in
interpreting and understanding visual data. We present a systematic literature
review of formulation and approaches to computer vision tasks guided by
physical laws. We begin by decomposing the popular computer vision pipeline
into a taxonomy of stages and investigate approaches to incorporate governing
physical equations in each stage. Existing approaches in each task are analyzed
with regard to what governing physical processes are modeled, formulated and
how they are incorporated, i.e. modify data (observation bias), modify networks
(inductive bias), and modify losses (learning bias). The taxonomy offers a
unified view of the application of the physics-informed capability,
highlighting where physics-informed learning has been conducted and where the
gaps and opportunities are. Finally, we highlight open problems and challenges
to inform future research. While still in its early days, the study of
physics-informed computer vision has the promise to develop better computer
vision models that can improve physical plausibility, accuracy, data efficiency
and generalization in increasingly realistic applications
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations
We propose CaSPR, a method to learn object-centric Canonical Spatiotemporal
Point Cloud Representations of dynamically moving or evolving objects. Our goal
is to enable information aggregation over time and the interrogation of object
state at any spatiotemporal neighborhood in the past, observed or not.
Different from previous work, CaSPR learns representations that support
spacetime continuity, are robust to variable and irregularly spacetime-sampled
point clouds, and generalize to unseen object instances. Our approach divides
the problem into two subtasks. First, we explicitly encode time by mapping an
input point cloud sequence to a spatiotemporally-canonicalized object space. We
then leverage this canonicalization to learn a spatiotemporal latent
representation using neural ordinary differential equations and a generative
model of dynamically evolving shapes using continuous normalizing flows. We
demonstrate the effectiveness of our method on several applications including
shape reconstruction, camera pose estimation, continuous spatiotemporal
sequence reconstruction, and correspondence estimation from irregularly or
intermittently sampled observations.Comment: NeurIPS 202
- …