647 research outputs found
Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation
Recently, mid-level features have shown promising performance in computer
vision. Mid-level features learned by incorporating class-level information are
potentially more discriminative than traditional low-level local features. In
this paper, an effective method is proposed to extract mid-level features from
Kinect skeletons for 3D human action recognition. Firstly, the orientations of
limbs connected by two skeleton joints are computed and each orientation is
encoded into one of the 27 states indicating the spatial relationship of the
joints. Secondly, limbs are combined into parts and the limb's states are
mapped into part states. Finally, frequent pattern mining is employed to mine
the most frequent and relevant (discriminative, representative and
non-redundant) states of parts in continuous several frames. These parts are
referred to as Frequent Local Parts or FLPs. The FLPs allow us to build
powerful bag-of-FLP-based action representation. This new representation yields
state-of-the-art results on MSR DailyActivity3D and MSR ActionPairs3D
Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
This paper addresses the problem of continuous gesture recognition from
sequences of depth maps using convolutional neutral networks (ConvNets). The
proposed method first segments individual gestures from a depth sequence based
on quantity of movement (QOM). For each segmented gesture, an Improved Depth
Motion Map (IDMM), which converts the depth sequence into one image, is
constructed and fed to a ConvNet for recognition. The IDMM effectively encodes
both spatial and temporal information and allows the fine-tuning with existing
ConvNet models for classification without introducing millions of parameters to
learn. The proposed method is evaluated on the Large-scale Continuous Gesture
Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved
the performance of 0.2655 (Mean Jaccard Index) and ranked place in
this challenge
Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI). These dynamic images are constructed from a sequence of depth maps
using bidirectional rank pooling to effectively capture the spatial-temporal
information. Such image-based representations enable us to fine-tune the
existing ConvNets models trained on image data for classification of depth
sequences, without introducing large parameters to learn. Upon the proposed
representations, a convolutional Neural networks (ConvNets) based method is
developed for gesture recognition and evaluated on the Large-scale Isolated
Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The
method achieved 55.57\% classification accuracy and ranked place in
this challenge but was very close to the best performance even though we only
used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633
Dual Long Short-Term Memory Networks for Sub-Character Representation Learning
Characters have commonly been regarded as the minimal processing unit in
Natural Language Processing (NLP). But many non-latin languages have
hieroglyphic writing systems, involving a big alphabet with thousands or
millions of characters. Each character is composed of even smaller parts, which
are often ignored by the previous work. In this paper, we propose a novel
architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to
learn sub-character level representation and capture deeper level of semantic
meanings. To build a concrete study and substantiate the efficiency of our
neural architecture, we take Chinese Word Segmentation as a research case
example. Among those languages, Chinese is a typical case, for which every
character contains several components called radicals. Our networks employ a
shared radical level embedding to solve both Simplified and Traditional Chinese
Word Segmentation, without extra Traditional to Simplified Chinese conversion,
in such a highly end-to-end way the word segmentation can be significantly
simplified compared to the previous work. Radical level embeddings can also
capture deeper semantic meaning below character level and improve the system
performance of learning. By tying radical and character embeddings together,
the parameter count is reduced whereas semantic knowledge is shared and
transferred between two levels, boosting the performance largely. On 3 out of 4
Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to
0.4%. Our results are reproducible, source codes and corpora are available on
GitHub.Comment: Accepted & forthcoming at ITNG-201
On-chip spectroscopy with thermally-tuned high-Q photonic crystal cavities
Spectroscopic methods are a sensitive way to determine the chemical
composition of potentially hazardous materials. Here, we demonstrate that
thermally-tuned high-Q photonic crystal cavities can be used as a compact
high-resolution on-chip spectrometer. We have used such a chip-scale
spectrometer to measure the absorption spectra of both acetylene and hydrogen
cyanide in the 1550 nm spectral band, and show that we can discriminate between
the two chemical species even though the two materials have spectral features
in the same spectral region. Our results pave the way for the development of
chip-size chemical sensors that can detect toxic substances
Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI), for both isolated and continuous action recognition. These dynamic
images are constructed from a segmented sequence of depth maps using
hierarchical bidirectional rank pooling to effectively capture the
spatial-temporal information. Specifically, DDI exploits the dynamics of
postures over time and DDNI and DDMNI exploit the 3D structural information
captured by depth maps. Upon the proposed representations, a ConvNet based
method is developed for action recognition. The image-based representations
enable us to fine-tune the existing Convolutional Neural Network (ConvNet)
models trained on image data without training a large number of parameters from
scratch. The proposed method achieved the state-of-art results on three large
datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means
Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset
(59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22%
cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814,
arXiv:1608.0633
Reshaping the Landscape of the Future: Software-Defined Manufacturing
We describe the concept of software-defined manufacturing, which divides the manufacturing ecosystem into software definition and physical manufacturing layers. Software-defined manufacturing allows better resource sharing and collaboration, and it has the potential to transform the existing manufacturing sector
- …