Search CORE

1,572 research outputs found

Simple and Complex Human Action Recognition in Constrained and Unconstrained Videos

Author: Mohammadi Nejad Eman
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2018
Field of study

Human action recognition plays a crucial role in visual learning applications such as video understanding and surveillance, video retrieval, human-computer interactions, and autonomous driving systems. A variety of methodologies have been proposed for human action recognition via developing of low-level features along with the bag-of-visual-word models. However, much less research has been performed on the compound of pre-processing, encoding and classification stages. This dissertation focuses on enhancing the action recognition performances via ensemble learning, hybrid classifier, hierarchical feature representation, and key action perception methodologies. Action variation is one of the crucial challenges in video analysis and action recognition. We address this problem by proposing the hybrid classifier (HC) to discriminate actions which contain similar forms of motion features such as walking, running, and jogging. Aside from that, we show and proof that the fusion of various appearance-based and motion features can boost the simple and complex action recognition performance. The next part of the dissertation introduces pooled-feature representation (PFR) which is derived from a double phase encoding framework (DPE). Considering that a given unconstrained video is composed of a sequence of simple frames, the first phase of DPE generates temporal sub-volumes from the video and represents them individually by employing the proposed improved rank pooling (IRP) method. The second phase constructs the pool of features by fusing the represented vectors from the first phase. The pool is compressed and then encoded to provide video-parts vector (VPV). The DPE framework allows distilling the video representation and hierarchically extracting new information. Compared with recent video encoding approaches, VPV can preserve the higher-level information through standard encoding of low-level features in two phases. Furthermore, the encoded vectors from both phases of DPE are fused along with a compression stage to develop PFR

Scholarship at UWindsor

Compressed Video Action Recognition

Author: Hu Hexiang
Krähenbühl Philipp
Manmatha R.
Smola Alexander J.
Wu Chao-Yuan
Zaheer Manzil
Publication venue
Publication date: 29/03/2018
Field of study

Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.Comment: CVPR 2018 (Selected for spotlight presentation

arXiv.org e-Print Archive

Crossref

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

Author: Abbas Alhabib
Andreopoulos Yiannis
Chadha Aaron
Publication venue
Publication date: 19/12/2017
Field of study

We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video Technology. Extension of ICIP 2017 conference pape

arXiv.org e-Print Archive

UCL Discovery

A survey on compact features for visual content analysis

Author: Baroffio Luca
Redondi Alessandro E. C
Tagliasacchi Marco
Tubaro Stefano
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Stereoscopic video quality assessment using binocular energy

Author: Chathura Galkandage (7185236)
J. Calic (7185239)
Jean-Yves Guillemaut (7185245)
Safak Dogan (1383819)
Publication venue
Publication date: 22/11/2016
Field of study

Stereoscopic imaging is becoming increasingly popular. However, to ensure the best quality of experience, there is a need to develop more robust and accurate objective metrics for stereoscopic content quality assessment. Existing stereoscopic image and video metrics are either extensions of conventional 2D metrics (with added depth or disparity information) or are based on relatively simple perceptual models. Consequently, they tend to lack the accuracy and robustness required for stereoscopic content quality assessment. This paper introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. The proposed approach is based on the following three contributions. First, it introduces a novel HVS model extending previous models to include the phenomena of binocular suppression and recurrent excitation. Second, an image quality metric based on the novel HVS model is proposed. Finally, an optimised temporal pooling strategy is introduced to extend the metric to the video domain. Both image and video quality metrics are obtained via a training procedure to establish a relationship between subjective scores and objective measures of the HVS model. The metrics are evaluated using publicly available stereoscopic image/video databases as well as a new stereoscopic video database. An extensive experimental evaluation demonstrates the robustness of the proposed quality metrics. This indicates a considerable improvement with respect to the state-of-the-art with average correlations with subjective scores of 0.86 for the proposed stereoscopic image metric and 0.89 and 0.91 for the proposed stereoscopic video metrics

Loughborough University Institutional Repository

Surrey Research Insight

Video coding for compression and content-based functionality

Author: Mulroy Patrick Joseph
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/1999
Field of study

The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group. This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed. Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced

Irish Universities

DCU Online Research Access Service

Novelty detection and context dependent processing of sky-compass cues in the brain of the desert locust Schistocerca gregaria

Author: Bockhorst Tobias
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2015
Field of study

NERVOUS SYSTEMS facilitate purposeful interactions between animals and their environment, based on the perceptual powers, cognition and higher motor control. Through goal-directed behavior, the animal aims to increase its advantage and minimize risk. For instance, the migratory desert locust should profit from being fast in finding a fresh habitat, thus minimizing the investment of bodily resources in locomotion as well as the risk of starvation or capture by a predator en route. Efficient solutions to this and similar tasks – be it finding your way to work, the daily foraging of worker bees or the seasonal long-range migration of monarch butterflies - strongly depend on spatial orientation in local or global frames of reference. Local settings may include visual landmarks at stable positions that can be mapped onto egocentric space and learned for orientation, e.g. to remember a short route to a source of benefit (e.g. food) that is distant or visually less salient than the landmarks. Compass signals can mediate orientation to a global reference-frame (allothetic orienation), e.g. for locomotion in a particular compass direction or to merely ensure motion along a straight line. Whilst spatial orientation is a prerequisite of doing the planned in such tasks, animal survival in general depends on the ability to adequately respond to the unexpected, i.e. to unpredicted events such as the approach of a predator or mate. The process of identifying relevant events in the outside world that are not predictable from preceding events is termed novelty detection. Yet, the definition of ‘novelty’ is highly contextual: depending on the current situation and goal, some changes may be irrelevant and remain ´undetected´. The present thesis describes neuronal representations of a compass stimulus, correlates of novelty detection and interactions between the two in the minute brain of an insect, the migratory desert locust Schistocerca gregaria. Experiments were carried out in tethered locusts with legs and wings removed. More precisely, adult male subjects in the gregarious phase (see phase theory, Uvarov 1966) that migrates in swarms across territories in North Africa and the Middle East were used. The author performed electrophysiological recordings from single neurons in the locust brain, while either the compass stimulus (Chapter I) or events in the visual scenery (Chapter II) or combinations of both (Chapter III) were being presented to the animal. Injections of a tracer through the recording electrode, visualized by means of fluorescent-dye coupling, allowed the allocation of cellular morphologies to previously described types of neuron or the characterization of novel cell types, respectively. Recordings were focused on cells of the central complex, a higher integration area in the insect brain that was shown to be involved in the visually mediated control of goal-directed locomotion. Experiments delivered insights into how representations of the compass cue are modulated in a manner suited for their integration in the control of goal-directed locomotion. In particular, an interaction between compass-signaling and novelty detection was found, corresponding to a process in which input in one sensory domain (object vision) modulates the processing of concurrent input to a different exteroceptive sensory system (compass sense). In addition to deepening the understanding of the compass network in the locust brain, the results reveal fundamental parallels to higher context-dependent processing of sensory information by the vertebrate cortex, both with respect to spatial cues and novelty detection

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Recommended from our members

3D multiple description coding for error resilience over wireless networks

Author: Umar Abubakar
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Mobile communications has gained a growing interest from both customers and service providers alike in the last 1-2 decades. Visual information is used in many application domains such as remote health care, video –on demand, broadcasting, video surveillance etc. In order to enhance the visual effects of digital video content, the depth perception needs to be provided with the actual visual content. 3D video has earned a significant interest from the research community in recent years, due to the tremendous impact it leaves on viewers and its enhancement of the user’s quality of experience (QoE). In the near future, 3D video is likely to be used in most video applications, as it offers a greater sense of immersion and perceptual experience. When 3D video is compressed and transmitted over error prone channels, the associated packet loss leads to visual quality degradation. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to resume decoding. Error propagation caused by employing predictive coding may degrade the video quality severely. There are several ways used to mitigate the effects of such transmission errors. One widely used technique in International Video Coding Standards is error resilience. The motivation behind this research work is that, existing schemes for 2D colour video compression such as MPEG, JPEG and H.263 cannot be applied to 3D video content. 3D video signals contain depth as well as colour information and are bandwidth demanding, as they require the transmission of multiple high-bandwidth 3D video streams. On the other hand, the capacity of wireless channels is limited and wireless links are prone to various types of errors caused by noise, interference, fading, handoff, error burst and network congestion. Given the maximum bit rate budget to represent the 3D scene, optimal bit-rate allocation between texture and depth information rendering distortion/losses should be minimised. To mitigate the effect of these errors on the perceptual 3D video quality, error resilience video coding needs to be investigated further to offer better quality of experience (QoE) to end users. This research work aims at enhancing the error resilience capability of compressed 3D video, when transmitted over mobile channels, using Multiple Description Coding (MDC) in order to improve better user’s quality of experience (QoE). Furthermore, this thesis examines the sensitivity of the human visual system (HVS) when employed to view 3D video scenes. The approach used in this study is to use subjective testing in order to rate people’s perception of 3D video under error free and error prone conditions through the use of a carefully designed bespoke questionnaire.Petroleum Technology Development Fund (PTDF

Brunel University Research Archive

No-reference image and video quality assessment: a classification and review of recent approaches

Author: A Amer
A Amer
A Chetouani
A Chetouani
A Ciancio
A Ciancio
A Eden
A Ichigaya
A Ichigaya
A Khan
A Khan
A Khan
A Maalouf
A Maalouf
A Mittal
A Mittal
A Raake
A Rossholm
A Rossholm
A Takahashi
AB Watson
AC Bovik
AG Davis
AK Moorthy
AM Treisman
AN Rimell
Andreas Rossholm
AR Reibman
AR Reibman
B Belmudez
B Lee
B-X Zuo
B-X Zuo
Benny Lövström
C Chen
C Chen
C Keimel
C Keimel
C Keimel
C Li
C Oprea
C-S Park
Cisco Visual Networking Index
D Bhattacharjee
D Ćulibrk
DL Ruderman
DM Chandler
E Cohen
F Battisti
F Yang
F Yang
F Yang
G Valenzise
G Valenzise
G Van Wallendael
G Yammine
G Zhai
H Boujut
H Liu
H Liu
H Liu
H Liu
H Liu
H Tong
Hans-Jürgen Zepernick
HR Sheikh
HR Sheikh
HR Wu
I Park
I Sedano
ITU
ITU
ITU-T
J Han
J Joskowicz
J Park
J Shen
J Tian
J You
J You
J Zhang
J Zhang
J Zhang
J Zhou
JE Caviedes
K Nishikawa
K Nishikawa
K Rank
K Watanabe
K Watanabe
K Yamagishi
K Zhu
K-C Yang
KD Singh
L Debing
L Liang
M Barkowsky
M Chin
M Ghazal
M Naccari
M Naccari
M Narwaria
M Ries
M Ries
M Ries
M Shahid
M Shahid
M Slanina
M Vranješ
M-J Chen
M-J Chen
M-N Garcia
MA Saad
MA Saad
MA Saad
MCQ Farias
MG Choi
MN Do
Muhammad Shahid
N Narvekar
N Narvekar
N Ponomarenko
N Staelens
N Staelens
ND Narvekar
NG Sadaka
O Sugimoto
OYG Castillo
P Gastaldo
P Kortum
P Marziliano
P Marziliano
P Romaniak
PL Callet
Q Huynh-Thu
Q Huynh-Thu
R Ferzli
R Ferzli
R Ferzli
R Ferzli
R Ferzli
R Hassen
R Soundararajan
RR Pastrana-Vidal
RR Pastrana-Vidal
RV Babu
RV Babu
S Argyropoulos
S Borer
S Chikkerur
S Gabarda
S Ouni
S Pyatykh
S Suresh
S Suthaharan
S Varadarajan
S Winkler
S Winkler
S Wolf
S Wu
S Wu
S Yao
S Zhao
S-O Lee
S-Y Shim
SI Olsen
SS Hemami
T Brandão
T Brandão
T Brandão
T Brandão
T Brandão
T Oelbaum
T Shanableh
T Shanableh
T Yamada
T Yamada
T Yamada
U Engelke
U Engelke
U Engelke
VQEG
VQEG
W Lin
W Lu
X Jiang
X Liu
X Liu
X Liu
X Marichal
X Zhu
X Zhu
X-H Wang
Z Hua
Z Hua
Z Wang
Z Wang
Z Wang
Z Zhang
ZMP Sazzad
ZMP Sazzad
ZMP Sazzad
ZMP Sazzad
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref