Search CORE

728,726 research outputs found

Appearance-and-Relation Networks for Video Classification

Author: Li Wei
Li Wen
Van Gool Luc
Wang Limin
Publication venue
Publication date: 06/05/2018
Field of study

Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.Comment: CVPR18 camera-ready version. Code & models available at https://github.com/wanglimin/ARTNe

arXiv.org e-Print Archive

Crossref

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

Author: Ricci Elisa
Sebe Nicu
Song Jingkuan
Xu Dan
Yan Yan
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.Comment: Oral paper in BMVC 201

arXiv.org e-Print Archive

Crossref

On the role of injection in kinetic approaches to nonlinear particle acceleration at non-relativistic shock waves

Author: Axford
Bell
Bell
Bell
Bell
Berezhko
Berezhko
Berezhko
Berezhko
Berezhko
Blandford
Blandford
Blandford
Blasi
Blasi
Drury
Drury
Drury
Drury
Duffy
Eichler
Eichler
Ellison
Ellison
Ellison
Ellison
Ellison
G. Vannoni
Gieseler
Jones
Kang
Kang
Kang
Kang
Lagage
Lucek
Lucek
Malkov
Malkov
Malkov
Malkov
Malkov
Malkov
Mond
P. Blasi
Ptuskin
S. Gabici
Toptygin
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

The dynamical reaction of the particles accelerated at a shock front by the first order Fermi process can be determined within kinetic models that account for both the hydrodynamics of the shocked fluid and the transport of the accelerated particles. These models predict the appearance of multiple solutions, all physically allowed. We discuss here the role of injection in selecting the real solution, in the framework of a simple phenomenological recipe, which is a variation of what is sometimes referred to as thermal leakage. In this context we show that multiple solutions basically disappear and when they are present they are limited to rather peculiar values of the parameters. We also provide a quantitative calculation of the efficiency of particle acceleration at cosmic ray modified shocks and we identify the fraction of energy which is advected downstream and that of particles escaping the system from upstream infinity at the maximum momentum. The consequences of efficient particle acceleration for shock heating are also discussed

arXiv.org e-Print Archive

Query generation from multiple media examples

Author: Jose J.M.
Ren R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2009
Field of study

This paper exploits an unified media document representation called feature terms for query generation from multiple media examples, e.g. images. A feature term refers to a value interval of a media feature. A media document is therefore represented by a frequency vector about feature term appearance. This approach (1) facilitates feature accumulation from multiple examples; (2) enables the exploration of text-based retrieval models for multimedia retrieval. Three statistical criteria, minimised chi-squared, minimised AC/DC rate and maximised entropy, are proposed to extract feature terms from a given media document collection. Two textual ranking functions, KL divergence and a BM25-like retrieval model, are adapted to estimate media document relevance. Experiments on the Corel photo collection and the TRECVid 2006 collection show the effectiveness of feature term based query in image and video retrieval

CiteSeerX

Crossref

Enlighten

Adaptive tracking via multiple appearance models and multiple linear searches

Author: Nguyen Tuan
Pridmore Tony
Publication venue
Publication date: 01/01/2014
Field of study

We introduce a unified tracker (FMCMC-MM) which adapts to changes in target appearance by combining two popular generative models: templates and histograms, maintaining multiple instances of each in an appearance pool, and enhances prediction by utilising multiple linear searches. These search directions are sparse estimates of motion direction derived from local features stored in a feature pool. Given only an initial template representation of the target, the proposed tracker can learn appearance changes in a supervised manner and generate appropriate target motions without knowing the target movement in advance. During tracking, it automatically switches between models in response to variations in target appearance, exploiting the strengths of each model component. New models are added, automatically, as necessary. The effectiveness of the approach is demonstrated using a variety of challenging video sequences. Results show that this framework outperforms existing appearance based tracking frameworks

BEAR (Buckingham E-Archive of Research)