4 research outputs found
Two-Stream Transformer Architecture for Long Video Understanding
Pure vision transformer architectures are highly effective for short video
classification and action recognition tasks. However, due to the quadratic
complexity of self attention and lack of inductive bias, transformers are
resource intensive and suffer from data inefficiencies. Long form video
understanding tasks amplify data and memory efficiency problems in transformers
making current approaches unfeasible to implement on data or memory restricted
domains. This paper introduces an efficient Spatio-Temporal Attention Network
(STAN) which uses a two-stream transformer architecture to model dependencies
between static image features and temporal contextual features. Our proposed
approach can classify videos up to two minutes in length on a single GPU, is
data efficient, and achieves SOTA performance on several long video
understanding tasks
New Screen Media : Cinema / Art / Narrative
Editors Rieser and Zapp argue that the new media has brought with it innovations in screen narrative form that raise issues about the body, identity, authorship, and temporal and spatial construction. Texts by cultural theorists are juxtaposed with artists’ analyses of their own work. Providing an overview of the history and theory of narrative and the media, the book documents the unique forms new media narrative practices have taken. Includes an interactive DVD-ROM featuring works by 36 artists. Biographical notes on contributors. Glossary (5 p.), bibliography (6 p.) and index (8 p.). Circa 250 bibl. ref