10,156 research outputs found
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network -- CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Classification-based prediction of effective connectivity between timeseries with a realistic cortical network model
Effective connectivity measures the pattern of causal interactions between brain regions. Traditionally, these patterns of causality are inferred from brain recordings using either non-parametric, i.e., model-free, or parametric, i.e., model-based, approaches. The latter approaches, when based on biophysically plausible models, have the advantage that they may facilitate the interpretation of causality in terms of underlying neural mechanisms. Recent biophysically plausible neural network models of recurrent microcircuits have shown the ability to reproduce well the characteristics of real neural activity and can be applied to model interacting cortical circuits. Unfortunately, however, it is challenging to invert these models in order to estimate effective connectivity from observed data. Here, we propose to use a classification-based method to approximate the result of such complex model inversion. The classifier predicts the pattern of causal interactions given a multivariate timeseries as input. The classifier is trained on a large number of pairs of multivariate timeseries and the respective pattern of causal interactions, which are generated by simulation from the neural network model. In simulated experiments, we show that the proposed method is much more accurate in detecting the causal structure of timeseries than current best practice methods. Additionally, we present further results to characterize the validity of the neural network model and the ability of the classifier to adapt to the generative model of the data
Dialogue Act Recognition via CRF-Attentive Structured Network
Dialogue Act Recognition (DAR) is a challenging problem in dialogue
interpretation, which aims to attach semantic labels to utterances and
characterize the speaker's intention. Currently, many existing approaches
formulate the DAR problem ranging from multi-classification to structured
prediction, which suffer from handcrafted feature extensions and attentive
contextual structural dependencies. In this paper, we consider the problem of
DAR from the viewpoint of extending richer Conditional Random Field (CRF)
structural dependencies without abandoning end-to-end training. We incorporate
hierarchical semantic inference with memory mechanism on the utterance
modeling. We then extend structured attention network to the linear-chain
conditional random field layer which takes into account both contextual
utterances and corresponding dialogue acts. The extensive experiments on two
major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder
Dialogue Act (MRDA) datasets show that our method achieves better performance
than other state-of-the-art solutions to the problem. It is a remarkable fact
that our method is nearly close to the human annotator's performance on SWDA
within 2% gap.Comment: 10 pages, 4figure
Exploring Context with Deep Structured models for Semantic Segmentation
State-of-the-art semantic image segmentation methods are mostly based on
training deep convolutional neural networks (CNNs). In this work, we proffer to
improve semantic segmentation with the use of contextual information. In
particular, we explore `patch-patch' context and `patch-background' context in
deep CNNs. We formulate deep structured models by combining CNNs and
Conditional Random Fields (CRFs) for learning the patch-patch context between
image regions. Specifically, we formulate CNN-based pairwise potential
functions to capture semantic correlations between neighboring patches.
Efficient piecewise training of the proposed deep structured model is then
applied in order to avoid repeated expensive CRF inference during the course of
back propagation. For capturing the patch-background context, we show that a
network design with traditional multi-scale image inputs and sliding pyramid
pooling is very effective for improving performance. We perform comprehensive
evaluation of the proposed method. We achieve new state-of-the-art performance
on a number of challenging semantic segmentation datasets including ,
-, , -, -,
-, and datasets. Particularly, we report an
intersection-over-union score of on the - dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine
Intelligence, 2017. Extended version of arXiv:1504.0101
- …