4,297 research outputs found
Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform
Detecting hand actions from ego-centric depth sequences is a practically
challenging problem, owing mostly to the complex and dexterous nature of hand
articulations as well as non-stationary camera motion. We address this problem
via a Hough transform based approach coupled with a discriminatively learned
error-correcting component to tackle the well known issue of incorrect votes
from the Hough transform. In this framework, local parts vote collectively for
the start end positions of each action over time. We also construct an
in-house annotated dataset of 300 long videos, containing 3,177 single-action
subsequences over 16 action classes collected from 26 individuals. Our system
is empirically evaluated on this real-life dataset for both the action
recognition and detection tasks, and is shown to produce satisfactory results.
To facilitate reproduction, the new dataset and our implementation are also
provided online
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
A Learning-Based Visual Saliency Prediction Model for Stereoscopic 3D Video (LBVS-3D)
Over the past decade, many computational saliency prediction models have been
proposed for 2D images and videos. Considering that the human visual system has
evolved in a natural 3D environment, it is only natural to want to design
visual attention models for 3D content. Existing monocular saliency models are
not able to accurately predict the attentive regions when applied to 3D
image/video content, as they do not incorporate depth information. This paper
explores stereoscopic video saliency prediction by exploiting both low-level
attributes such as brightness, color, texture, orientation, motion, and depth,
as well as high-level cues such as face, person, vehicle, animal, text, and
horizon. Our model starts with a rough segmentation and quantifies several
intuitive observations such as the effects of visual discomfort level, depth
abruptness, motion acceleration, elements of surprise, size and compactness of
the salient regions, and emphasizing only a few salient objects in a scene. A
new fovea-based model of spatial distance between the image regions is adopted
for considering local and global feature calculations. To efficiently fuse the
conspicuity maps generated by our method to one single saliency map that is
highly correlated with the eye-fixation data, a random forest based algorithm
is utilized. The performance of the proposed saliency model is evaluated
against the results of an eye-tracking experiment, which involved 24 subjects
and an in-house database of 61 captured stereoscopic videos. Our stereo video
database as well as the eye-tracking data are publicly available along with
this paper. Experiment results show that the proposed saliency prediction
method achieves competitive performance compared to the state-of-the-art
approaches
Monotonic Calibrated Interpolated Look-Up Tables
Real-world machine learning applications may require functions that are
fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of
the learned function can be critical to user trust. We propose meeting these
goals for low-dimensional machine learning problems by learning flexible,
monotonic functions using calibrated interpolated look-up tables. We extend the
structural risk minimization framework of lattice regression to train monotonic
look-up tables by solving a convex problem with appropriate linear inequality
constraints. In addition, we propose jointly learning interpretable
calibrations of each feature to normalize continuous features and handle
categorical or missing data, at the cost of making the objective non-convex. We
address large-scale learning through parallelization, mini-batching, and
propose random sampling of additive regularizer terms. Case studies with
real-world problems with five to sixteen features and thousands to millions of
training samples demonstrate the proposed monotonic functions can achieve
state-of-the-art accuracy on practical problems while providing greater
transparency to users.Comment: To appear (with minor revisions), Journal Machine Learning Research
201
Advances in Human Action Recognition: A Survey
Human action recognition has been an important topic in computer vision due
to its many applications such as video surveillance, human machine interaction
and video retrieval. One core problem behind these applications is
automatically recognizing low-level actions and high-level activities of
interest. The former is usually the basis for the latter. This survey gives an
overview of the most recent advances in human action recognition during the
past several years, following a well-formed taxonomy proposed by a previous
survey. From this state-of-the-art survey, researchers can view a panorama of
progress in this area for future research
Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey
Monitoring underwater habitats is a vital part of observing the condition of
the environment. The detection and mapping of underwater vegetation, especially
seagrass has drawn the attention of the research community as early as the
nineteen eighties. Initially, this monitoring relied on in situ observation by
experts. Later, advances in remote-sensing technology, satellite-monitoring
techniques and, digital photo- and video-based techniques opened a window to
quicker, cheaper, and, potentially, more accurate seagrass-monitoring methods.
So far, for seagrass detection and mapping, digital images from airborne
cameras, spectral images from satellites, acoustic image data using underwater
sonar technology, and digital underwater photo and video images have been used
to map the seagrass meadows or monitor their condition. In this article, we
have reviewed the recent approaches to seagrass detection and mapping to
understand the gaps of the present approaches and determine further research
scope to monitor the ocean health more easily. We have identified four classes
of approach to seagrass mapping and assessment: still image-, video data-,
acoustic image-, and spectral image data-based techniques. We have critically
analysed the surveyed approaches and found the research gaps including the need
for quick, cheap and effective imaging techniques robust to depth, turbidity,
location and weather conditions, fully automated seagrass detectors that can
work in real-time, accurate techniques for estimating the seagrass density, and
the availability of high computation facilities for processing large scale
data. For addressing these gaps, future research should focus on developing
cheaper image and video data collection techniques, deep learning based
automatic annotation and classification, and real-time percentage-cover
calculation.Comment: 36 pages, 14 figures, 8table
Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation
Recently there has been an increasing trend to use deep learning frameworks
for both 2D consumer images and for 3D medical images. However, there has been
little effort to use deep frameworks for volumetric vascular segmentation. We
wanted to address this by providing a freely available dataset of 12 annotated
two-photon vasculature microscopy stacks. We demonstrated the use of deep
learning framework consisting both 2D and 3D convolutional filters (ConvNet).
Our hybrid 2D-3D architecture produced promising segmentation result. We
derived the architectures from Lee et al. who used the ZNN framework initially
designed for electron microscope image segmentation. We hope that by sharing
our volumetric vasculature datasets, we will inspire other researchers to
experiment with vasculature dataset and improve the used network architectures.Comment: 23 pages, 10 figure
An Iterative Spanning Forest Framework for Superpixel Segmentation
Superpixel segmentation has become an important research problem in image
processing. In this paper, we propose an Iterative Spanning Forest (ISF)
framework, based on sequences of Image Foresting Transforms, where one can
choose i) a seed sampling strategy, ii) a connectivity function, iii) an
adjacency relation, and iv) a seed pixel recomputation procedure to generate
improved sets of connected superpixels (supervoxels in 3D) per iteration. The
superpixels in ISF structurally correspond to spanning trees rooted at those
seeds. We present five ISF methods to illustrate different choices of its
components. These methods are compared with approaches from the
state-of-the-art in effectiveness and efficiency. The experiments involve 2D
and 3D datasets with distinct characteristics, and a high level application,
named sky image segmentation. The theoretical properties of ISF are
demonstrated in the supplementary material and the results show that some of
its methods are competitive with or superior to the best baselines in
effectiveness and efficiency
An Efficient Approach to Communication-aware Path Planning for Long-range Surveillance Missions undertaken by UAVs
While using drones for remote surveillance missions, it is mandatory to do
path planning of the vehicle since these are pilot-less vehicles. Path
planning, whether offline or online, entails setting up the path as a sequence
of locations in the 3D Euclidean space, whose coordinates happen to be
latitude, longitude and altitude. For the specific application of remote
surveillance of long linear infrastructures in non-urban terrain, the
continuous 3D-ESP problem practically entails two important scalar costs. The
first scalar cost is the distance traveled along the planned path. Since drones
are battery operated, hence it is needed that the path length between fixed
start and goal locations of a mission should be minimal at all costs. The other
scalar cost is the cost of transmitting the acquired video during the mission
of remote surveillance, via a camera mounted in the drone's belly. Because of
the length of surveillance target which is long linear infrastructure, the
amount of video generated is very high and cannot be generally stored in its
entirety, on board. If the connectivity is poor along certain segments of a
naive path, to boost video transmission rate, the transmission power of the
signal is kept high, which in turn dissipates more battery energy. Hence a path
is desired that simultaneously also betters what is known as communication
cost. These two costs trade-off, and hence Pareto optimization is needed for
this 3D bi-objective Euclidean shortest path problem. In this report, we study
the mono-objective offline path planning problem, based on the distance cost,
while posing the communication cost as an upper-bounded constraint. The
bi-objective path planning solution is sketched out towards the end.Comment: 46 pages. One part of this thesis, handling the turn constrained
route planning, has been published at ECMR'1
Geo-Supervised Visual Depth Prediction
We propose using global orientation from inertial measurements, and the bias
it induces on the shape of objects populating the scene, to inform visual 3D
reconstruction. We test the effect of using the resulting prior in depth
prediction from a single image, where the normal vectors to surfaces of objects
of certain classes tend to align with gravity or be orthogonal to it. Adding
such a prior to baseline methods for monocular depth prediction yields
improvements beyond the state-of-the-art and illustrates the power of gravity
as a supervisory signal.Comment: ICRA 2019, RA-L 201
- …