144,419 research outputs found
An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions
In this paper, we show that different body parts do not play equally
important roles in recognizing a human action in video data. We investigate to
what extent a body part plays a role in recognition of different actions and
hence propose a generic method of assigning weights to different body points.
The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they
recognize events or objects by only focusing on visually significant aspects.
An important contribution of our method is that the computation of the weights
assigned to body parts is invariant to viewing directions and camera parameters
in the input data. We have performed extensive experiments to validate the
proposed approach and demonstrate its significance. In particular, results show
that considerable improvement in performance is gained by taking into account
the relative importance of different body parts as defined by our approach.Comment: arXiv admin note: substantial text overlap with arXiv:1705.04641,
arXiv:1705.05741, arXiv:1705.0443
Volumetric Super-Resolution of Multispectral Data
Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
order to reconstruct a high-spatial/high-spectral resolution multispectral
image volume, either the information in MS and PAN images are fused (i.e.
pansharpening) or super-resolution reconstruction (SRR) is used with only MS
images captured on different dates. Existing methods do not utilize temporal
information of MS and high spatial resolution of PAN images together to improve
the resolution. In this paper, we propose a multiframe SRR algorithm using
pansharpened MS images, taking advantage of both temporal and spatial
information available in multispectral imagery, in order to exceed spatial
resolution of given PAN images. We first apply pansharpening to a set of
multispectral images and their corresponding PAN images captured on different
dates. Then, we use the pansharpened multispectral images as input to the
proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
proposed SRR method is obtained by deriving the subband relations between
multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
We present an algorithm for simultaneous face detection, landmarks
localization, pose estimation and gender recognition using deep convolutional
neural networks (CNN). The proposed method called, HyperFace, fuses the
intermediate layers of a deep CNN using a separate CNN followed by a multi-task
learning algorithm that operates on the fused features. It exploits the synergy
among the tasks which boosts up their individual performances. Additionally, we
propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the
ResNet-101 model and achieves significant improvement in performance, and (2)
Fast-HyperFace that uses a high recall fast face detector for generating region
proposals to improve the speed of the algorithm. Extensive experiments show
that the proposed models are able to capture both global and local information
in faces and performs significantly better than many competitive algorithms for
each of these four tasks.Comment: Accepted in Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Identifying Object States in Cooking-Related Images
Understanding object states is as important as object recognition for robotic
task planning and manipulation. To our knowledge, this paper explicitly
introduces and addresses the state identification problem in cooking related
images for the first time. In this paper, objects and ingredients in cooking
videos are explored and the most frequent objects are analyzed. Eleven states
from the most frequent cooking objects are examined and a dataset of images
containing those objects and their states is created. As a solution to the
state identification problem, a Resnet based deep model is proposed. The model
is initialized with Imagenet weights and trained on the dataset of eleven
classes. The trained state identification model is evaluated on a subset of the
Imagenet dataset and state labels are provided using a combination of the model
with manual checking. Moreover, an individual model is fine-tuned for each
object in the dataset using the weights from the initially trained model and
object-specific images, where significant improvement is demonstrated.Comment: 7 pages, 8 figure
Predictive biometrics: A review and analysis of predicting personal characteristics from biometric data
Interest in the exploitation of soft biometrics information has continued to develop over the last decade or so. In comparison with traditional biometrics, which focuses principally on person identification, the idea of soft biometrics processing is to study the utilisation of more general information regarding a system user, which is not necessarily unique. There are increasing indications that this type of data will have great value in providing complementary information for user authentication. However, the authors have also seen a growing interest in broadening the predictive capabilities of biometric data, encompassing both easily definable characteristics such as subject age and, most recently, `higher level' characteristics such as emotional or mental states. This study will present a selective review of the predictive capabilities, in the widest sense, of biometric data processing, providing an analysis of the key issues still adequately to be addressed if this concept of predictive biometrics is to be fully exploited in the future
Visual Affordance and Function Understanding: A Survey
Nowadays, robots are dominating the manufacturing, entertainment and
healthcare industries. Robot vision aims to equip robots with the ability to
discover information, understand it and interact with the environment. These
capabilities require an agent to effectively understand object affordances and
functionalities in complex visual domains. In this literature survey, we first
focus on Visual affordances and summarize the state of the art as well as open
problems and research gaps. Specifically, we discuss sub-problems such as
affordance detection, categorization, segmentation and high-level reasoning.
Furthermore, we cover functional scene understanding and the prevalent
functional descriptors used in the literature. The survey also provides
necessary background to the problem, sheds light on its significance and
highlights the existing challenges for affordance and functionality learning.Comment: 26 pages, 22 image
Causes of discomfort in stereoscopic content: a review
This paper reviews the causes of discomfort in viewing stereoscopic content.
These include objective factors, such as misaligned images, as well as
subjective factors, such as excessive disparity. Different approaches to the
measurement of visual discomfort are also reviewed, in relation to the
underlying physiological and psychophysical processes. The importance of
understanding these issues, in the context of new display technologies, is
emphasized
Multigrid Predictive Filter Flow for Unsupervised Learning on Videos
We introduce multigrid Predictive Filter Flow (mgPFF), a framework for
unsupervised learning on videos. The mgPFF takes as input a pair of frames and
outputs per-pixel filters to warp one frame to the other. Compared to optical
flow used for warping frames, mgPFF is more powerful in modeling sub-pixel
movement and dealing with corruption (e.g., motion blur). We develop a
multigrid coarse-to-fine modeling strategy that avoids the requirement of
learning large filters to capture large displacement. This allows us to train
an extremely compact model (4.6MB) which operates in a progressive way over
multiple resolutions with shared weights. We train mgPFF on unsupervised,
free-form videos and show that mgPFF is able to not only estimate long-range
flow for frame reconstruction and detect video shot transitions, but also
readily amendable for video object segmentation and pose tracking, where it
substantially outperforms the published state-of-the-art without bells and
whistles. Moreover, owing to mgPFF's nature of per-pixel filter prediction, we
have the unique opportunity to visualize how each pixel is evolving during
solving these tasks, thus gaining better interpretability.Comment: webpage (https://www.ics.uci.edu/~skong2/mgpff.html
Sub-Pixel Registration of Wavelet-Encoded Images
Sub-pixel registration is a crucial step for applications such as
super-resolution in remote sensing, motion compensation in magnetic resonance
imaging, and non-destructive testing in manufacturing, to name a few. Recently,
these technologies have been trending towards wavelet encoded imaging and
sparse/compressive sensing. The former plays a crucial role in reducing imaging
artifacts, while the latter significantly increases the acquisition speed. In
view of these new emerging needs for applications of wavelet encoded imaging,
we propose a sub-pixel registration method that can achieve direct wavelet
domain registration from a sparse set of coefficients. We make the following
contributions: (i) We devise a method of decoupling scale, rotation, and
translation parameters in the Haar wavelet domain, (ii) We derive explicit
mathematical expressions that define in-band sub-pixel registration in terms of
wavelet coefficients, (iii) Using the derived expressions, we propose an
approach to achieve in-band subpixel registration, avoiding back and forth
transformations. (iv) Our solution remains highly accurate even when a sparse
set of coefficients are used, which is due to localization of signals in a
sparse set of wavelet coefficients. We demonstrate the accuracy of our method,
and show that it outperforms the state-of-the-art on simulated and real data,
even when the data is sparse
Diversity in Machine Learning
Machine learning methods have achieved good performance and been widely
applied in various real-world applications. They can learn the model adaptively
and be better fit for special requirements of different tasks. Generally, a
good machine learning system is composed of plentiful training data, a good
model training process, and an accurate inference. Many factors can affect the
performance of the machine learning process, among which the diversity of the
machine learning process is an important one. The diversity can help each
procedure to guarantee a total good machine learning: diversity of the training
data ensures that the training data can provide more discriminative information
for the model, diversity of the learned model (diversity in parameters of each
model or diversity among different base models) makes each parameter/model
capture unique or complement information and the diversity in inference can
provide multiple choices each of which corresponds to a specific plausible
local optimal result. Even though the diversity plays an important role in
machine learning process, there is no systematical analysis of the
diversification in machine learning system. In this paper, we systematically
summarize the methods to make data diversification, model diversification, and
inference diversification in the machine learning process, respectively. In
addition, the typical applications where the diversity technology improved the
machine learning performance have been surveyed, including the remote sensing
imaging tasks, machine translation, camera relocalization, image segmentation,
object detection, topic modeling, and others. Finally, we discuss some
challenges of the diversity technology in machine learning and point out some
directions in future work.Comment: Accepted by IEEE Acces
- …