71,628 research outputs found
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
This article describes the final solution of team monkeytyping, who finished
in second place in the YouTube-8M video understanding challenge. The dataset
used in this challenge is a large-scale benchmark for multi-label video
classification. We extend the work in [1] and propose several improvements for
frame sequence modeling. We propose a network structure called Chaining that
can better capture the interactions between labels. Also, we report our
approaches in dealing with multi-scale information and attention pooling. In
addition, We find that using the output of model ensemble as a side target in
training can boost single model performance. We report our experiments in
bagging, boosting, cascade, and stacking, and propose a stacking algorithm
called attention weighted stacking. Our final submission is an ensemble that
consists of 74 sub models, all of which are listed in the appendix.Comment: Submitted to the CVPR 2017 Workshop on YouTube-8M Large-Scale Video
Understandin
Recommended from our members
Cops, popes, and garbage collectors: metaphor and antagonism in an atheist/Christian YouTube video thread
Using a discourse dynamics, metaphor-led analysis, this article investigates the use of metaphor in three YouTube videos made by two American YouTube users: one fundamentalist Christian and one atheist. The focus of the analysis is on how metaphor was produced dynamically in the interaction between the users as they discussed the appropriateness of user actions. Metaphorical language was of key importance to the discourse event, and was explicitly oriented to by the participants: The Christian user suggests an analogy between himself and a “cop,” the atheist retaliates that the Christian believes himself to be “the Pope of YouTube,” and the Christian resists this characterization, with other users leaving text comments that also directly respond to the “Pope of YouTube” metaphor. The analysis shows that YouTube users employed metaphors to describe and validate their activity on YouTube, and that although metaphor use did not differ depending on the user's ideological position, users reinterpreted and subverted the metaphor use of others to assert their own opinions about the community
Recommended from our members
The Pope of YouTube: Metaphor and misunderstanding in Atheist-Christian YouTube dialogue
Using a discourse dynamics analysis, this article investigates the use of metaphor in three YouTube videos made by two American YouTube users: one a fundamentalist Christian and one an atheist. The focus of the analysis is on how metaphor was produced dynamically in the interaction and what this interaction may tell us about how misunderstanding occurred between the two users. Analysis shows that understanding of specific metaphors seems to differ depending on who is producing and interpreting a given metaphor
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Conventional sequential learning methods such as Recurrent Neural Networks
(RNNs) focus on interactions between consecutive inputs, i.e. first-order
Markovian dependency. However, most of sequential data, as seen with videos,
have complex dependency structures that imply variable-length semantic flows
and their compositions, and those are hard to be captured by conventional
methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for
learning video data by discovering these complex structures of the video. The
CB-GLNs represent video data as a graph, with nodes and edges corresponding to
frames of the video and their dependencies respectively. The CB-GLNs find
compositional dependencies of the data in multilevel graph forms via a
parameterized kernel with graph-cut and a message passing framework. We
evaluate the proposed method on the two different tasks for video
understanding: Video theme classification (Youtube-8M dataset) and Video
Question and Answering (TVQA dataset). The experimental results show that our
model efficiently learns the semantic compositional structure of video data.
Furthermore, our model achieves the highest performance in comparison to other
baseline methods.Comment: 8 pages, 3 figures, Association for the Advancement of Artificial
Intelligence (AAAI2020). arXiv admin note: substantial text overlap with
arXiv:1907.0170
- …