33,979 research outputs found
Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding
Human action recognition refers to automatic recognizing human actions from a
video clip. In reality, there often exist multiple human actions in a video
stream. Such a video stream is often weakly-annotated with a set of relevant
human action labels at a global level rather than assigning each label to a
specific video episode corresponding to a single action, which leads to a
multi-label learning problem. Furthermore, there are many meaningful human
actions in reality but it would be extremely difficult to collect/annotate
video clips regarding all of various human actions, which leads to a zero-shot
learning scenario. To the best of our knowledge, there is no work that has
addressed all the above issues together in human action recognition. In this
paper, we formulate a real-world human action recognition task as a multi-label
zero-shot learning problem and propose a framework to tackle this problem in a
holistic way. Our framework holistically tackles the issue of unknown temporal
boundaries between different actions for multi-label learning and exploits the
side information regarding the semantic relationship between different human
actions for knowledge transfer. Consequently, our framework leads to a joint
latent ranking embedding for multi-label zero-shot human action recognition. A
novel neural architecture of two component models and an alternate learning
algorithm are proposed to carry out the joint latent ranking embedding
learning. Thus, multi-label zero-shot recognition is done by measuring
relatedness scores of action labels to a test video clip in the joint latent
visual and semantic embedding spaces. We evaluate our framework with different
settings, including a novel data split scheme designed especially for
evaluating multi-label zero-shot learning, on two datasets: Breakfast and
Charades. The experimental results demonstrate the effectiveness of our
framework.Comment: 27 pages, 10 figures and 7 tables. Technical report submitted to a
journal. More experimental results/references were added and typos were
correcte
Transferable Positive/Negative Speech Emotion Recognition via Class-wise Adversarial Domain Adaptation
Speech emotion recognition plays an important role in building more
intelligent and human-like agents. Due to the difficulty of collecting speech
emotional data, an increasingly popular solution is leveraging a related and
rich source corpus to help address the target corpus. However, domain shift
between the corpora poses a serious challenge, making domain shift adaptation
difficult to function even on the recognition of positive/negative emotions. In
this work, we propose class-wise adversarial domain adaptation to address this
challenge by reducing the shift for all classes between different corpora.
Experiments on the well-known corpora EMODB and Aibo demonstrate that our
method is effective even when only a very limited number of target labeled
examples are provided.Comment: 5 pages, 3 figures, accepted to ICASSP 201
A Total Fractional-Order Variation Model for Image Restoration with Non-homogeneous Boundary Conditions and its Numerical Solution
To overcome the weakness of a total variation based model for image
restoration, various high order (typically second order) regularization models
have been proposed and studied recently. In this paper we analyze and test a
fractional-order derivative based total -order variation model, which
can outperform the currently popular high order regularization models. There
exist several previous works using total -order variations for image
restoration; however first no analysis is done yet and second all tested
formulations, differing from each other, utilize the zero Dirichlet boundary
conditions which are not realistic (while non-zero boundary conditions violate
definitions of fractional-order derivatives). This paper first reviews some
results of fractional-order derivatives and then analyzes the theoretical
properties of the proposed total -order variational model rigorously.
It then develops four algorithms for solving the variational problem, one based
on the variational Split-Bregman idea and three based on direct solution of the
discretise-optimization problem. Numerical experiments show that, in terms of
restoration quality and solution efficiency, the proposed model can produce
highly competitive results, for smooth images, to two established high order
models: the mean curvature and the total generalized variation.Comment: 26 page
X-ray Diffraction Tomographic Imaging and Reconstruction
Material discrimination based on conventional or dual energy X-ray computed tomography (CT) imaging can be ambiguous. X-ray diffraction imaging (XDI) can be used to construct diffraction profiles of objects, providing new molecular signature information that can be used to characterize the presence of specific materials. Combining X-ray CT and diffraction imaging can lead to enhanced detection and identification of explosives in luggage screening. In this work we are investigating techniques for joint reconstruction of CT absorption and X-ray diffraction profile images of objects to achieve improved image quality and enhanced material classification. The initial results have been validated via simulation of X-ray absorption and coherent scattering in 2 dimensions.U. S. Department of Homeland Security (2008-ST-061-ED0001
Exploring Language-Independent Emotional Acoustic Features via Feature Selection
We propose a novel feature selection strategy to discover
language-independent acoustic features that tend to be responsible for emotions
regardless of languages, linguistics and other factors. Experimental results
suggest that the language-independent feature subset discovered yields the
performance comparable to the full feature set on various emotional speech
corpora.Comment: 15 pages, 2 figures, 6 table
- …