18,047 research outputs found
An Effect of Relative Motion on Trajectory Discrimination
Psychophysical studies point to the existence of specialized mechanisms sensitive to the relative motion between an object and its background. Such mechanisms would seem ideal for the motion-based segmentation of objects; however, their properties and role in processing the visual scene remain unclear. Here we examine the contribution of relative motion mechanisms to the processing of object trajectory. In a series of four psychophysical experiments we examine systematically the effects of relative direction and speed differences on the perceived trajectory of an object against a moving background. We show that background motion systematically influences the discrimination of object direction. Subjects’ ability to discriminate direction was consistently better for objects moving opposite a translating background than for objects moving in the same direction as the background. This effect was limited to the case of a translating background and did not affect perceived trajectory for more complex background motions associated with self-motion. We interpret these differences as providing support for the role of relative motion mechanisms in the segmentation and representation of object motions that do not occlude the path of an observer’s self-motion
Not Using the Car to See the Sidewalk: Quantifying and Controlling the Effects of Context in Classification and Segmentation
Importance of visual context in scene understanding tasks is well recognized
in the computer vision community. However, to what extent the computer vision
models for image classification and semantic segmentation are dependent on the
context to make their predictions is unclear. A model overly relying on context
will fail when encountering objects in context distributions different from
training data and hence it is important to identify these dependencies before
we can deploy the models in the real-world. We propose a method to quantify the
sensitivity of black-box vision models to visual context by editing images to
remove selected objects and measuring the response of the target models. We
apply this methodology on two tasks, image classification and semantic
segmentation, and discover undesirable dependency between objects and context,
for example that "sidewalk" segmentation relies heavily on "cars" being present
in the image. We propose an object removal based data augmentation solution to
mitigate this dependency and increase the robustness of classification and
segmentation models to contextual variations. Our experiments show that the
proposed data augmentation helps these models improve the performance in
out-of-context scenarios, while preserving the performance on regular data.Comment: 14 pages (12 figures
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Real-World Repetition Estimation by Div, Grad and Curl
We consider the problem of estimating repetition in video, such as performing
push-ups, cutting a melon or playing violin. Existing work shows good results
under the assumption of static and stationary periodicity. As realistic video
is rarely perfectly static and stationary, the often preferred Fourier-based
measurements is inapt. Instead, we adopt the wavelet transform to better handle
non-static and non-stationary video dynamics. From the flow field and its
differentials, we derive three fundamental motion types and three motion
continuities of intrinsic periodicity in 3D. On top of this, the 2D perception
of 3D periodicity considers two extreme viewpoints. What follows are 18
fundamental cases of recurrent perception in 2D. In practice, to deal with the
variety of repetitive appearance, our theory implies measuring time-varying
flow and its differentials (gradient, divergence and curl) over segmented
foreground motion. For experiments, we introduce the new QUVA Repetition
dataset, reflecting reality by including non-static and non-stationary videos.
On the task of counting repetitions in video, we obtain favorable results
compared to a deep learning alternative
Accelerated hardware video object segmentation: From foreground detection to connected components labelling
This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency
- …