1,849 research outputs found
Background subtraction on depth videos with convolutional neural networks
Background subtraction is a significant component of computer vision systems.
It is widely used in video surveillance, object tracking, anomaly detection,
etc. A new data source for background subtraction appeared as the emergence of
low-cost depth sensors like Microsof t Kinect, Asus Xtion PRO, etc. In this
paper, we propose a background subtraction approach on depth videos, which is
based on convolutional neural networks (CNNs), called BGSNet-D (BackGround
Subtraction neural Networks for Depth videos). The method can be used in color
unavailable scenarios like poor lighting situations, and can also be applied to
combine with existing RGB background subtraction methods. A preprocessing
strategy is designed to reduce the influences incurred by noise from depth
sensors. The experimental results on the SBM-RGBD dataset show that the
proposed method outperforms existing methods on depth data
Learning to Detect Instantaneous Changes with Retrospective Convolution and Static Sample Synthesis
Change detection has been a challenging visual task due to the dynamic nature
of real-world scenes. Good performance of existing methods depends largely on
prior background images or a long-term observation. These methods, however,
suffer severe degradation when they are applied to detection of instantaneously
occurred changes with only a few preceding frames provided. In this paper, we
exploit spatio-temporal convolutional networks to address this challenge, and
propose a novel retrospective convolution, which features efficient change
information extraction between the current frame and frames from historical
observation. To address the problem of foreground-specific over-fitting in
learning-based methods, we further propose a data augmentation method, named
static sample synthesis, to guide the network to focus on learning change-cued
information rather than specific spatial features of foreground. Trained
end-to-end with complex scenarios, our framework proves to be accurate in
detecting instantaneous changes and robust in combating diverse noises.
Extensive experiments demonstrate that our proposed method significantly
outperforms existing methods.Comment: 10 pages, 9 figure
Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera
It is crucial to reduce natural gas methane emissions, which can potentially
offset the climate benefits of replacing coal with gas. Optical gas imaging
(OGI) is a widely-used method to detect methane leaks, but is labor-intensive
and cannot provide leak detection results without operators' judgment. In this
paper, we develop a computer vision approach to OGI-based leak detection using
convolutional neural networks (CNN) trained on methane leak images to enable
automatic detection. First, we collect ~1 M frames of labeled video of methane
leaks from different leaking equipment for building CNN model, covering a wide
range of leak sizes (5.3-2051.6 gCH4/h) and imaging distances (4.6-15.6 m).
Second, we examine different background subtraction methods to extract the
methane plume in the foreground. Third, we then test three CNN model variants,
collectively called GasNet, to detect plumes in videos taken at other pieces of
leaking equipment. We assess the ability of GasNet to perform leak detection by
comparing it to a baseline method that uses optical-flow based change detection
algorithm. We explore the sensitivity of results to the CNN structure, with a
moderate-complexity variant performing best across distances. We find that the
detection accuracy can reach as high as 99%, the overall detection accuracy can
exceed 95% for a case across all leak sizes and imaging distances. Binary
detection accuracy exceeds 97% for large leaks (~710 gCH4/h) imaged closely
(~5-7 m). At closer imaging distances (~5-10 m), CNN-based models have greater
than 94% accuracy across all leak sizes. At farthest distances (~13-16 m),
performance degrades rapidly, but it can achieve above 95% accuracy to detect
large leaks (>950 gCH4/h). The GasNet-based computer vision approach could be
deployed in OGI surveys to allow automatic vigilance of methane leak detection
with high detection accuracy in the real world.Comment: This paper was submitted to Applied Energ
Unsupervised RGBD Video Object Segmentation Using GANs
Video object segmentation is a fundamental step in many advanced vision
applications. Most existing algorithms are based on handcrafted features such
as HOG, super-pixel segmentation or texture-based techniques, while recently
deep features have been found to be more efficient. Existing algorithms observe
performance degradation in the presence of challenges such as illumination
variations, shadows, and color camouflage. To handle these challenges we
propose a fusion based moving object segmentation algorithm which exploits
color as well as depth information using GAN to achieve more accuracy. Our goal
is to segment moving objects in the presence of challenging background scenes,
in real environments. To address this problem, GAN is trained in an
unsupervised manner on color and depth information independently with
challenging video sequences. During testing, the trained GAN generates
backgrounds similar to that in the test sample. The generated background
samples are then compared with the test sample to segment moving objects. The
final result is computed by fusion of object boundaries in both modalities, RGB
and the depth. The comparison of our proposed algorithm with five
state-of-the-art methods on publicly available dataset has shown the strength
of our algorithm for moving object segmentation in videos in the presence of
challenging real scenarios.Comment: 15 pages, 3 figures, ACCV workshop on RGB-D-sensing and understanding
via combined colour and dept
Fingertip Detection and Tracking for Recognition of Air-Writing in Videos
Air-writing is the process of writing characters or words in free space using
finger or hand movements without the aid of any hand-held device. In this work,
we address the problem of mid-air finger writing using web-cam video as input.
In spite of recent advances in object detection and tracking, accurate and
robust detection and tracking of the fingertip remains a challenging task,
primarily due to small dimension of the fingertip. Moreover, the initialization
and termination of mid-air finger writing is also challenging due to the
absence of any standard delimiting criterion. To solve these problems, we
propose a new writing hand pose detection algorithm for initialization of
air-writing using the Faster R-CNN framework for accurate hand detection
followed by hand segmentation and finally counting the number of raised fingers
based on geometrical properties of the hand. Further, we propose a robust
fingertip detection and tracking approach using a new signature function called
distance-weighted curvature entropy. Finally, a fingertip velocity-based
termination criterion is used as a delimiter to mark the completion of the
air-writing gesture. Experiments show the superiority of the proposed fingertip
detection and tracking algorithm over state-of-the-art approaches giving a mean
precision of 73.1 % while achieving real-time performance at 18.5 fps, a
condition which is of vital importance to air-writing. Character recognition
experiments give a mean accuracy of 96.11 % using the proposed air-writing
system, a result which is comparable to that of existing handwritten character
recognition systems.Comment: 32 pages, 10 figures, 2 tables. Submitted to Journal of Expert
Systems with Application
Differentiating Objects by Motion: Joint Detection and Tracking of Small Flying Objects
While generic object detection has achieved large improvements with rich
feature hierarchies from deep nets, detecting small objects with poor visual
cues remains challenging. Motion cues from multiple frames may be more
informative for detecting such hard-to-distinguish objects in each frame.
However, how to encode discriminative motion patterns, such as deformations and
pose changes that characterize objects, has remained an open question. To learn
them and thereby realize small object detection, we present a neural model
called the Recurrent Correlational Network, where detection and tracking are
jointly performed over a multi-frame representation learned through a single,
trainable, and end-to-end network. A convolutional long short-term memory
network is utilized for learning informative appearance change for detection,
while learned representation is shared in tracking for enhancing its
performance. In experiments with datasets containing images of scenes with
small flying objects, such as birds and unmanned aerial vehicles, the proposed
method yielded consistent improvements in detection performance over deep
single-frame detectors and existing motion-based detectors. Furthermore, our
network performs as well as state-of-the-art generic object trackers when it
was evaluated as a tracker on the bird dataset.Comment: 10 pages, 8 figure
Review on Computer Vision Techniques in Emergency Situation
In emergency situations, actions that save lives and limit the impact of
hazards are crucial. In order to act, situational awareness is needed to decide
what to do. Geolocalized photos and video of the situations as they evolve can
be crucial in better understanding them and making decisions faster. Cameras
are almost everywhere these days, either in terms of smartphones, installed
CCTV cameras, UAVs or others. However, this poses challenges in big data and
information overflow. Moreover, most of the time there are no disasters at any
given location, so humans aiming to detect sudden situations may not be as
alert as needed at any point in time. Consequently, computer vision tools can
be an excellent decision support. The number of emergencies where computer
vision tools has been considered or used is very wide, and there is a great
overlap across related emergency research. Researchers tend to focus on
state-of-the-art systems that cover the same emergency as they are studying,
obviating important research in other fields. In order to unveil this overlap,
the survey is divided along four main axes: the types of emergencies that have
been studied in computer vision, the objective that the algorithms can address,
the type of hardware needed and the algorithms used. Therefore, this review
provides a broad overview of the progress of computer vision covering all sorts
of emergencies.Comment: 25 page
Using Deep Convolutional Networks for Gesture Recognition in American Sign Language
In the realm of multimodal communication, sign language is, and continues to
be, one of the most understudied areas. In line with recent advances in the
field of deep learning, there are far reaching implications and applications
that neural networks can have for sign language interpretation. In this paper,
we present a method for using deep convolutional networks to classify images of
both the the letters and digits in American Sign Language.Comment: 12 figure
Action4D: Real-time Action Recognition in the Crowd and Clutter
Recognizing every person's action in a crowded and cluttered environment is a
challenging task. In this paper, we propose a real-time action recognition
method, Action4D, which gives reliable and accurate results in the real-world
settings. We propose to tackle the action recognition problem using a holistic
4D "scan" of a cluttered scene to include every detail about the people and
environment. Recognizing multiple people's actions in the cluttered 4D
representation is a new problem. In this paper, we propose novel methods to
solve this problem. We propose a new method to track people in 4D, which can
reliably detect and follow each person in real time. We propose a new deep
neural network, the Action4D-Net, to recognize the action of each tracked
person. The Action4D-Net's novel structure uses both the global feature and the
focused attention to achieve state-of-the-art result. Our real-time method is
invariant to camera view angles, resistant to clutter and able to handle crowd.
The experimental results show that the proposed method is fast, reliable and
accurate. Our method paves the way to action recognition in the real-world
applications and is ready to be deployed to enable smart homes, smart factories
and smart stores
Skeleton-based Action Recognition of People Handling Objects
In visual surveillance systems, it is necessary to recognize the behavior of
people handling objects such as a phone, a cup, or a plastic bag. In this
paper, to address this problem, we propose a new framework for recognizing
object-related human actions by graph convolutional networks using human and
object poses. In this framework, we construct skeletal graphs of reliable human
poses by selectively sampling the informative frames in a video, which include
human joints with high confidence scores obtained in pose estimation. The
skeletal graphs generated from the sampled frames represent human poses related
to the object position in both the spatial and temporal domains, and these
graphs are used as inputs to the graph convolutional networks. Through
experiments over an open benchmark and our own data sets, we verify the
validity of our framework in that our method outperforms the state-of-the-art
method for skeleton-based action recognition.Comment: Accepted in WACV 201
- …