807 research outputs found
Sparse Predictive Structure of Deconvolved Functional Brain Networks
The functional and structural representation of the brain as a complex
network is marked by the fact that the comparison of noisy and intrinsically
correlated high-dimensional structures between experimental conditions or
groups shuns typical mass univariate methods. Furthermore most network
estimation methods cannot distinguish between real and spurious correlation
arising from the convolution due to nodes' interaction, which thus introduces
additional noise in the data. We propose a machine learning pipeline aimed at
identifying multivariate differences between brain networks associated to
different experimental conditions. The pipeline (1) leverages the deconvolved
individual contribution of each edge and (2) maps the task into a sparse
classification problem in order to construct the associated "sparse deconvolved
predictive network", i.e., a graph with the same nodes of those compared but
whose edge weights are defined by their relevance for out of sample predictions
in classification. We present an application of the proposed method by decoding
the covert attention direction (left or right) based on the single-trial
functional connectivity matrix extracted from high-frequency
magnetoencephalography (MEG) data. Our results demonstrate how network
deconvolution matched with sparse classification methods outperforms typical
approaches for MEG decoding
Compact Tensor Pooling for Visual Question Answering
Performing high level cognitive tasks requires the integration of feature
maps with drastically different structure. In Visual Question Answering (VQA)
image descriptors have spatial structures, while lexical inputs inherently
follow a temporal sequence. The recently proposed Multimodal Compact Bilinear
pooling (MCB) forms the outer products, via count-sketch approximation, of the
visual and textual representation at each spatial location. While this
procedure preserves spatial information locally, outer-products are taken
independently for each fiber of the activation tensor, and therefore do not
include spatial context. In this work, we introduce multi-dimensional sketch
({MD-sketch}), a novel extension of count-sketch to tensors. Using this new
formulation, we propose Multimodal Compact Tensor Pooling (MCT) to fully
exploit the global spatial context during bilinear pooling operations.
Contrarily to MCB, our approach preserves spatial context by directly
convolving the MD-sketch from the visual tensor features with the text vector
feature using higher order FFT. Furthermore we apply MCT incrementally at each
step of the question embedding and accumulate the multi-modal vectors with a
second LSTM layer before the final answer is chosen
An introduction to spectral distances in networks (extended version)
Many functions have been recently defined to assess the similarity among
networks as tools for quantitative comparison. They stem from very different
frameworks - and they are tuned for dealing with different situations. Here we
show an overview of the spectral distances, highlighting their behavior in some
basic cases of static and dynamic synthetic and real networks
Question Type Guided Attention in Visual Question Answering
Visual Question Answering (VQA) requires integration of feature maps with
drastically different structures and focus of the correct regions. Image
descriptors have structures at multiple spatial scales, while lexical inputs
inherently follow a temporal sequence and naturally cluster into semantically
different question types. A lot of previous works use complex models to extract
feature representations but neglect to use high-level information summary such
as question types in learning. In this work, we propose Question Type-guided
Attention (QTA). It utilizes the information of question type to dynamically
balance between bottom-up and top-down visual features, respectively extracted
from ResNet and Faster R-CNN networks. We experiment with multiple VQA
architectures with extensive input ablation studies over the TDIUC dataset and
show that QTA systematically improves the performance by more than 5% across
multiple question type categories such as "Activity Recognition", "Utility" and
"Counting" on TDIUC dataset. By adding QTA on the state-of-art model MCB, we
achieve 3% improvement for overall accuracy. Finally, we propose a multi-task
extension to predict question types which generalizes QTA to applications that
lack of question type, with minimal performance loss
- …