12 research outputs found
Hierarchical Multi-scale Attention Networks for action recognition
Recurrent Neural Networks (RNNs) have been widely used in natural language
processing and computer vision. Among them, the Hierarchical Multi-scale RNN
(HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn
the hierarchical temporal structure from data automatically. In this paper, we
extend the work to solve the computer vision task of action recognition.
However, in sequence-to-sequence models like RNN, it is normally very hard to
discover the relationships between inputs and outputs given static inputs. As a
solution, attention mechanism could be applied to extract the relevant
information from input thus facilitating the modeling of input-output
relationships. Based on these considerations, we propose a novel attention
network, namely Hierarchical Multi-scale Attention Network (HM-AN), by
combining the HM-RNN and the attention mechanism and apply it to action
recognition. A newly proposed gradient estimation method for stochastic
neurons, namely Gumbel-softmax, is exploited to implement the temporal boundary
detectors and the stochastic hard attention mechanism. To amealiate the
negative effect of sensitive temperature of the Gumbel-softmax, an adaptive
temperature training method is applied to better the system performance. The
experimental results demonstrate the improved effect of HM-AN over LSTM with
attention on the vision task. Through visualization of what have been learnt by
the networks, it can be observed that both the attention regions of images and
the hierarchical temporal structure can be captured by HM-AN
Hierarchical long short-term memory for action recognition based on 3D skeleton joints from Kinect sensor
Action recognition has been used in a wide range of applications such as human-computer interaction, intelligent video surveillance systems, video summarization, and robotics. Recognizing action is important for intelligent agents to understand, learn and interact with the environment. The recent technology that allows the acquisition of RGB+D and 3D skeleton data and a deep learning model's development significantly increases the action recognition model's performance. In this research, hierarchical Long Sort-Term Memory is proposed to recognize action based on 3D skeleton joints from Kinect sensor. The model uses the 3D axis of skeleton joints and groups each joint in the axis into parts, namely, spine, left and right arm, left and right hand, and left and right leg. To fit the hierarchically structured layers of LSTM, the parts are concatenated into spine, arms, hands, and legs and then concatenated into the body. The model crosses the body in each axis into a single final body and fed to the final layer to classify the action. The performance is measured using cross-view and cross-subject evaluation and achieves accuracy 0.854 and 0.837, respectively, from the 10 action classes of the NTU RGB+D dataset
Triggering Dark Showers with Conditional Dual Auto-Encoders
Auto-encoders (AEs) have the potential to be effective and generic tools for
new physics searches at colliders, requiring little to no model-dependent
assumptions. New hypothetical physics signals can be considered anomalies that
deviate from the well-known background processes generally expected to describe
the whole dataset. We present a search formulated as an anomaly detection (AD)
problem, using an AE to define a criterion to decide about the physics nature
of an event. In this work, we perform an AD search for manifestations of a dark
version of strong force using raw detector images, which are large and very
sparse, without leveraging any physics-based pre-processing or assumption on
the signals. We propose a dual-encoder design which can learn a compact latent
space through conditioning. In the context of multiple AD metrics, we present a
clear improvement over competitive baselines and prior approaches. It is the
first time that an AE is shown to exhibit excellent discrimination against
multiple dark shower models, illustrating the suitability of this method as a
performant, model-independent algorithm to deploy, e.g., in the trigger stage
of LHC experiments such as ATLAS and CMS.Comment: 25 pages, 7 figures, and 11 table
A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by its unnormalized (log-)probabilities. Over the past years, the machine learning community has proposed several extensions of this trick to facilitate, e.g., drawing multiple samples, sampling from structured domains, or gradient estimation for error backpropagation in neural network optimization. The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection. Moreover, it presents a comprehensive outline of (machine learning) literature in which Gumbel-based algorithms have been leveraged, reviews commonly-made design choices, and sketches a future perspective.</p