325 research outputs found
Video-based Sign Language Recognition without Temporal Segmentation
Millions of hearing impaired people around the world routinely use some
variants of sign languages to communicate, thus the automatic translation of a
sign language is meaningful and important. Currently, there are two
sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that
recognizes word by word and continuous SLR that translates entire sentences.
Existing continuous SLR methods typically utilize isolated SLRs as building
blocks, with an extra layer of preprocessing (temporal segmentation) and
another layer of post-processing (sentence synthesis). Unfortunately, temporal
segmentation itself is non-trivial and inevitably propagates errors into
subsequent steps. Worse still, isolated SLR methods typically require strenuous
labeling of each word separately in a sentence, severely limiting the amount of
attainable training data. To address these challenges, we propose a novel
continuous sign recognition framework, the Hierarchical Attention Network with
Latent Space (LS-HAN), which eliminates the preprocessing of temporal
segmentation. The proposed LS-HAN consists of three components: a two-stream
Convolutional Neural Network (CNN) for video feature representation generation,
a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention
Network (HAN) for latent space based recognition. Experiments are carried out
on two large scale datasets. Experimental results demonstrate the effectiveness
of the proposed framework.Comment: 32nd AAAI Conference on Artificial Intelligence (AAAI-18), Feb. 2-7,
2018, New Orleans, Louisiana, US
PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait
Psychological trait estimation from external factors such as movement and
appearance is a challenging and long-standing problem in psychology, and is
principally based on the psychological theory of embodiment. To date, attempts
to tackle this problem have utilized private small-scale datasets with
intrusive body-attached sensors. Potential applications of an automated system
for psychological trait estimation include estimation of occupational fatigue
and psychology, and marketing and advertisement. In this work, we propose PsyMo
(Psychological traits from Motion), a novel, multi-purpose and multi-modal
dataset for exploring psychological cues manifested in walking patterns. We
gathered walking sequences from 312 subjects in 7 different walking variations
and 6 camera angles. In conjunction with walking sequences, participants filled
in 6 psychological questionnaires, totalling 17 psychometric attributes related
to personality, self-esteem, fatigue, aggressiveness and mental health. We
propose two evaluation protocols for psychological trait estimation. Alongside
the estimation of self-reported psychological traits from gait, the dataset can
be used as a drop-in replacement to benchmark methods for gait recognition. We
anonymize all cues related to the identity of the subjects and publicly release
only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes
Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition
Research in human action recognition has accelerated significantly since the
introduction of powerful machine learning tools such as Convolutional Neural
Networks (CNNs). However, effective and efficient methods for incorporation of
temporal information into CNNs are still being actively explored in the recent
literature. Motivated by the popular recurrent attention models in the research
area of natural language processing, we propose the Attention-based Temporal
Weighted CNN (ATW), which embeds a visual attention model into a temporal
weighted multi-stream CNN. This attention model is simply implemented as
temporal weighting yet it effectively boosts the recognition performance of
video representations. Besides, each stream in the proposed ATW framework is
capable of end-to-end training, with both network parameters and temporal
weights optimized by stochastic gradient descent (SGD) with backpropagation.
Our experiments show that the proposed attention mechanism contributes
substantially to the performance gains with the more discriminative snippets by
focusing on more relevant video segments.Comment: 14th International Conference on Artificial Intelligence Applications
and Innovations (AIAI 2018), May 25-27, 2018, Rhodes, Greec
Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness
A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets
3D Object Instance Recognition and Pose Estimation Using Triplet Loss with Dynamic Margin
In this paper, we address the problem of 3D object instance recognition and
pose estimation of localized objects in cluttered environments using
convolutional neural networks. Inspired by the descriptor learning approach of
Wohlhart et al., we propose a method that introduces the dynamic margin in the
manifold learning triplet loss function. Such a loss function is designed to
map images of different objects under different poses to a lower-dimensional,
similarity-preserving descriptor space on which efficient nearest neighbor
search algorithms can be applied. Introducing the dynamic margin allows for
faster training times and better accuracy of the resulting low-dimensional
manifolds. Furthermore, we contribute the following: adding in-plane rotations
(ignored by the baseline method) to the training, proposing new background
noise types that help to better mimic realistic scenarios and improve accuracy
with respect to clutter, adding surface normals as another powerful image
modality representing an object surface leading to better performance than
merely depth, and finally implementing an efficient online batch generation
that allows for better variability during the training phase. We perform an
exhaustive evaluation to demonstrate the effects of our contributions.
Additionally, we assess the performance of the algorithm on the large BigBIRD
dataset to demonstrate good scalability properties of the pipeline with respect
to the number of models
Non-line-of-sight Imaging
Emerging single-photon-sensitive sensors combined with advanced inverse
methods to process picosecond-accurate time-stamped photon counts have given
rise to unprecedented imaging capabilities. Rather than imaging photons that
travel along direct paths from a source to an object and back to the detector,
non-line-of-sight (NLOS) imaging approaches analyse photons {scattered from
multiple surfaces that travel} along indirect light paths to estimate 3D images
of scenes outside the direct line of sight of a camera, hidden by a wall or
other obstacles. Here we review recent advances in the field of NLOS imaging,
discussing how to see around corners and future prospects for the field
Learning Affinity via Spatial Propagation Networks
In this paper, we propose spatial propagation networks for learning the
affinity matrix for vision tasks. We show that by constructing a row/column
linear propagation model, the spatially varying transformation matrix exactly
constitutes an affinity matrix that models dense, global pairwise relationships
of an image. Specifically, we develop a three-way connection for the linear
propagation model, which (a) formulates a sparse transformation matrix, where
all elements can be the output from a deep CNN, but (b) results in a dense
affinity matrix that effectively models any task-specific pairwise similarity
matrix. Instead of designing the similarity kernels according to image features
of two points, we can directly output all the similarities in a purely
data-driven manner. The spatial propagation network is a generic framework that
can be applied to many affinity-related tasks, including but not limited to
image matting, segmentation and colorization, to name a few. Essentially, the
model can learn semantically-aware affinity values for high-level vision tasks
due to the powerful learning capability of the deep neural network classifier.
We validate the framework on the task of refinement for image segmentation
boundaries. Experiments on the HELEN face parsing and PASCAL VOC-2012 semantic
segmentation tasks show that the spatial propagation network provides a
general, effective and efficient solution for generating high-quality
segmentation results.Comment: A long version of NIPS 201
DeepV2D: Video to Depth with Differentiable Structure from Motion
We propose DeepV2D, an end-to-end deep learning architecture for predicting
depth from video. DeepV2D combines the representation ability of neural
networks with the geometric principles governing image formation. We compose a
collection of classical geometric algorithms, which are converted into
trainable modules and combined into an end-to-end differentiable architecture.
DeepV2D interleaves two stages: motion estimation and depth estimation. During
inference, motion and depth estimation are alternated and converge to accurate
depth. Code is available https://github.com/princeton-vl/DeepV2D
Reducing Total Power Consumption Method in Cloud Computing Environments
The widespread use of cloud computing services is expected to increase the
power consumed by ICT equipment in cloud computing environments rapidly. This
paper first identifies the need of the collaboration among servers, the
communication network and the power network, in order to reduce the total power
consumption by the entire ICT equipment in cloud computing environments. Five
fundamental policies for the collaboration are proposed and the algorithm to
realize each collaboration policy is outlined. Next, this paper proposes
possible signaling sequences to exchange information on power consumption
between network and servers, in order to realize the proposed collaboration
policy. Then, in order to reduce the power consumption by the network, this
paper proposes a method of estimating the volume of power consumption by all
network devices simply and assigning it to an individual user.Comment: 16 page
- …