94,083 research outputs found
Weakly Labelled AudioSet Tagging with Attention Neural Networks
Audio tagging is the task of predicting the presence or absence of sound
classes within an audio clip. Previous work in audio tagging focused on
relatively small datasets limited to recognising a small number of sound
classes. We investigate audio tagging on AudioSet, which is a dataset
consisting of over 2 million audio clips and 527 classes. AudioSet is weakly
labelled, in that only the presence or absence of sound classes is known for
each clip, while the onset and offset times are unknown. To address the
weakly-labelled audio tagging problem, we propose attention neural networks as
a way to attend the most salient parts of an audio clip. We bridge the
connection between attention neural networks and multiple instance learning
(MIL) methods, and propose decision-level and feature-level attention neural
networks for audio tagging. We investigate attention neural networks modeled by
different functions, depths and widths. Experiments on AudioSet show that the
feature-level attention neural network achieves a state-of-the-art mean average
precision (mAP) of 0.369, outperforming the best multiple instance learning
(MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In
addition, we discover that the audio tagging performance on AudioSet embedding
features has a weak correlation with the number of training samples and the
quality of labels of each sound class.Comment: 13 page
Coalition structure generation over graphs
We give the analysis of the computational complexity of coalition structure generation over graphs. Given an undirected graph G = (N,E) and a valuation function v : P(N) → R over the subsets of nodes, the problem is to find a partition of N into connected subsets, that maximises the sum of the components values. This problem is generally NP-complete; in particular, it is hard for a defined class of valuation functions which are independent of disconnected members — that is, two nodes have no effect on each others marginal contribution to their vertex separator. Nonetheless, for all such functions we provide bounds on the complexity of coalition structure generation over general and minor free graphs. Our proof is constructive and yields algorithms for solving corresponding instances of the problem. Furthermore, we derive linear time bounds for graphs of bounded treewidth. However, as we show, the problem remains NP-complete for planar graphs, and hence, for any Kk minor free graphs where k ≥ 5. Moreover, a 3-SAT problem with m clauses can be represented by a coalition structure generation problem over a planar graph with O(m2) nodes. Importantly, our hardness result holds for a particular subclass of valuation functions, termed edge sum, where the value of each subset of nodes is simply determined by the sum of given weights of the edges in the induced subgraph
Multi-scale Deep Learning Architectures for Person Re-identification
Person Re-identification (re-id) aims to match people across non-overlapping
camera views in a public space. It is a challenging problem because many people
captured in surveillance videos wear similar clothes. Consequently, the
differences in their appearance are often subtle and only detectable at the
right location and scales. Existing re-id models, particularly the recently
proposed deep learning based ones match people at a single scale. In contrast,
in this paper, a novel multi-scale deep learning model is proposed. Our model
is able to learn deep discriminative feature representations at different
scales and automatically determine the most suitable scales for matching. The
importance of different spatial locations for extracting discriminative
features is also learned explicitly. Experiments are carried out to demonstrate
that the proposed model outperforms the state-of-the art on a number of
benchmarksComment: 9 pages, 3 figures, accepted by ICCV 201
- …