261,704 research outputs found
Visual features are processed before navigational affordances in the human brain
To navigate through their immediate environment humans process scene information rapidly. How does the cascade of neural processing elicited by scene viewing to facilitate navigational planning unfold over time? To investigate, we recorded human brain responses to visual scenes with electroencephalography and related those to computational models that operationalize three aspects of scene processing (2D, 3D, and semantic information), as well as to a behavioral model capturing navigational affordances. We found a temporal processing hierarchy: navigational affordance is processed later than the other scene features (2D, 3D, and semantic) investigated. This reveals the temporal order with which the human brain computes complex scene information and suggests that the brain leverages these pieces of information to plan navigation
TemporalAugmenter: An Ensemble Recurrent Based Deep Learning Approach for Signal Classification
Ensemble modeling has been widely used to solve complex problems as it helps
to improve overall performance and generalization. In this paper, we propose a
novel TemporalAugmenter approach based on ensemble modeling for augmenting the
temporal information capturing for long-term and short-term dependencies in
data integration of two variations of recurrent neural networks in two learning
streams to obtain the maximum possible temporal extraction. Thus, the proposed
model augments the extraction of temporal dependencies. In addition, the
proposed approach reduces the preprocessing and prior stages of feature
extraction, which reduces the required energy to process the models built upon
the proposed TemporalAugmenter approach, contributing towards green AI.
Moreover, the proposed model can be simply integrated into various domains
including industrial, medical, and human-computer interaction applications. Our
proposed approach empirically evaluated the speech emotion recognition,
electrocardiogram signal, and signal quality examination tasks as three
different signals with varying complexity and different temporal dependency
features.Comment: 9 pages, 5 figures, 9 tables, under review proces
Foundations and modelling of dynamic networks using Dynamic Graph Neural Networks: A survey
Dynamic networks are used in a wide range of fields, including social network
analysis, recommender systems, and epidemiology. Representing complex networks
as structures changing over time allow network models to leverage not only
structural but also temporal patterns. However, as dynamic network literature
stems from diverse fields and makes use of inconsistent terminology, it is
challenging to navigate. Meanwhile, graph neural networks (GNNs) have gained a
lot of attention in recent years for their ability to perform well on a range
of network science tasks, such as link prediction and node classification.
Despite the popularity of graph neural networks and the proven benefits of
dynamic network models, there has been little focus on graph neural networks
for dynamic networks. To address the challenges resulting from the fact that
this research crosses diverse fields as well as to survey dynamic graph neural
networks, this work is split into two main parts. First, to address the
ambiguity of the dynamic network terminology we establish a foundation of
dynamic networks with consistent, detailed terminology and notation. Second, we
present a comprehensive survey of dynamic graph neural network models using the
proposed terminologyComment: 28 pages, 9 figures, 8 table
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
Patent Citation Dynamics Modeling via Multi-Attention Recurrent Networks
Modeling and forecasting forward citations to a patent is a central task for
the discovery of emerging technologies and for measuring the pulse of inventive
progress. Conventional methods for forecasting these forward citations cast the
problem as analysis of temporal point processes which rely on the conditional
intensity of previously received citations. Recent approaches model the
conditional intensity as a chain of recurrent neural networks to capture memory
dependency in hopes of reducing the restrictions of the parametric form of the
intensity function. For the problem of patent citations, we observe that
forecasting a patent's chain of citations benefits from not only the patent's
history itself but also from the historical citations of assignees and
inventors associated with that patent. In this paper, we propose a
sequence-to-sequence model which employs an attention-of-attention mechanism to
capture the dependencies of these multiple time sequences. Furthermore, the
proposed model is able to forecast both the timestamp and the category of a
patent's next citation. Extensive experiments on a large patent citation
dataset collected from USPTO demonstrate that the proposed model outperforms
state-of-the-art models at forward citation forecasting
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
- …