Search CORE

2,293 research outputs found

Differential Recurrent Neural Networks for Action Recognition

Author: Qi Guo-Jun
Veeriah Vivek
Zhuang Naifan
Publication venue
Publication date: 17/02/2015
Field of study

The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any sequential time-series data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ideal choice to learn the complex dynamics of various actions. Unfortunately, the conventional LSTMs do not consider the impact of spatio-temporal dynamics corresponding to the given salient motion patterns, when they gate the information that ought to be memorized through time. To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames. This change in information gain is quantified by Derivative of States (DoS), and thus the proposed LSTM model is termed as differential Recurrent Neural Network (dRNN). We demonstrate the effectiveness of the proposed model by automatically recognizing actions from the real-world 2D and 3D human action datasets. Our study is one of the first works towards demonstrating the potential of learning complex time-series representations via high-order derivatives of states

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

Computational cytometer based on magnetically modulated coherent imaging and deep learning.

Author: Bai Bijie
Ceylan Koydemir Hatice
Cheung Chloe
Di Carlo Dino
Duan Zhuoran
Feizi Alborz
Garner Omai B
Guziak Alexander
Kim Danny
Kim Donghyuk
Kong Janay
Liu Tairan
Liu Xuewei
Luo Yi
Ouyang Mengxing
Ozcan Aydogan
Ray Aniruddha
Tsai Katherine
Yalcin Sener
Zhang Yibo
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Detecting rare cells within blood has numerous applications in disease diagnostics. Existing rare cell detection techniques are typically hindered by their high cost and low throughput. Here, we present a computational cytometer based on magnetically modulated lensless speckle imaging, which introduces oscillatory motion to the magnetic-bead-conjugated rare cells of interest through a periodic magnetic force and uses lensless time-resolved holographic speckle imaging to rapidly detect the target cells in three dimensions (3D). In addition to using cell-specific antibodies to magnetically label target cells, detection specificity is further enhanced through a deep-learning-based classifier that is based on a densely connected pseudo-3D convolutional neural network (P3D CNN), which automatically detects rare cells of interest based on their spatio-temporal features under a controlled magnetic force. To demonstrate the performance of this technique, we built a high-throughput, compact and cost-effective prototype for detecting MCF7 cancer cells spiked in whole blood samples. Through serial dilution experiments, we quantified the limit of detection (LoD) as 10 cells per millilitre of whole blood, which could be further improved through multiplexing parallel imaging channels within the same instrument. This compact, cost-effective and high-throughput computational cytometer can potentially be used for rare cell detection and quantification in bodily fluids for a variety of biomedical applications

eScholarship - University of California

A survey on passive digital video forgery detection techniques

Author: Jegaveerapandian Liba Manopriya
Periyaswamy Prakash
Rani Arockia Jansi
Velusamy Sakthivel
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2023
Field of study

Digital media devices such as smartphones, cameras, and notebooks are becoming increasingly popular. Through digital platforms such as Facebook, WhatsApp, Twitter, and others, people share digital images, videos, and audio in large quantities. Especially in a crime scene investigation, digital evidence plays a crucial role in a courtroom. Manipulating video content with high-quality software tools is easier, which helps fabricate video content more efficiently. It is therefore necessary to develop an authenticating method for detecting and verifying manipulated videos. The objective of this paper is to provide a comprehensive review of the passive methods for detecting video forgeries. This survey has the primary goal of studying and analyzing the existing passive techniques for detecting video forgeries. First, an overview of the basic information needed to understand video forgery detection is presented. Later, it provides an in-depth understanding of the techniques used in the spatial, temporal, and spatio-temporal domain analysis of videos, datasets used, and their limitations are reviewed. In the following sections, standard benchmark video forgery datasets and the generalized architecture for passive video forgery detection techniques are discussed in more depth. Finally, identifying loopholes in existing surveys so detecting forged videos much more effectively in the future are discussed

Institute of Advanced Engineering and Science

REPRESENTATION LEARNING FOR ACTION RECOGNITION

Author: C Krishna Mohan
Roy Debaditya
Publication venue
Publication date: 01/01/2018
Field of study

The objective of this research work is to develop discriminative representations for human actions. The motivation stems from the fact that there are many issues encountered while capturing actions in videos like intra-action variations (due to actors, viewpoints, and duration), inter-action similarity, background motion, and occlusion of actors. Hence, obtaining a representation which can address all the variations in the same action while maintaining discrimination with other actions is a challenging task. In literature, actions have been represented either using either low-level or high-level features. Low-level features describe the motion and appearance in small spatio-temporal volumes extracted from a video. Due to the limited space-time volume used for extracting low-level features, they are not able to account for viewpoint and actor variations or variable length actions. On the other hand, high-level features handle variations in actors, viewpoints, and duration but the resulting representation is often high-dimensional which introduces the curse of dimensionality. In this thesis, we propose new representations for describing actions by combining the advantages of both low-level and high-level features. Specifically, we investigate various linear and non-linear decomposition techniques to extract meaningful attributes in both high-level and low-level features. In the first approach, the sparsity of high-level feature descriptors is leveraged to build action-specific dictionaries. Each dictionary retains only the discriminative information for a particular action and hence reduces inter-action similarity. Then, a sparsity-based classification method is proposed to classify the low-rank representation of clips obtained using these dictionaries. We show that this representation based on dictionary learning improves the classification performance across actions. Also, a few of the actions consist of rapid body deformations that hinder the extraction of local features from body movements. Hence, we propose to use a dictionary which is trained on convolutional neural network (CNN) features of the human body in various poses to reliably identify actors from the background. Particularly, we demonstrate the efficacy of sparse representation in the identification of the human body under rapid and substantial deformation. In the first two approaches, sparsity-based representation is developed to improve discriminability using class-specific dictionaries that utilize action labels. However, developing an unsupervised representation of actions is more beneficial as it can be used to both recognize similar actions and localize actions. We propose to exploit inter-action similarity to train a universal attribute model (UAM) in order to learn action attributes (common and distinct) implicitly across all the actions. Using maximum aposteriori (MAP) adaptation, a high-dimensional super action-vector (SAV) for each clip is extracted. As this SAV contains redundant attributes of all other actions, we use factor analysis to extract a novel lowvi dimensional action-vector representation for each clip. Action-vectors are shown to suppress background motion and highlight actions of interest in both trimmed and untrimmed clips that contributes to action recognition without the help of any classifiers. It is observed during our experiments that action-vector cannot effectively discriminate between actions which are visually similar to each other. Hence, we subject action-vectors to supervised linear embedding using linear discriminant analysis (LDA) and probabilistic LDA (PLDA) to enforce discrimination. Particularly, we show that leveraging complimentary information across action-vectors using different local features followed by discriminative embedding provides the best classification performance. Further, we explore non-linear embedding of action-vectors using Siamese networks especially for fine-grained action recognition. A visualization of the hidden layer output in Siamese networks shows its ability to effectively separate visually similar actions. This leads to better classification performance than linear embedding on fine-grained action recognition. All of the above approaches are presented on large unconstrained datasets with hundreds of examples per action. However, actions in surveillance videos like snatch thefts are difficult to model because of the diverse variety of scenarios in which they occur and very few labeled examples. Hence, we propose to utilize the universal attribute model (UAM) trained on large action datasets to represent such actions. Specifically, we show that there are similarities between certain actions in the large datasets with snatch thefts which help in extracting a representation for snatch thefts using the attributes from the UAM. This representation is shown to be effective in distinguishing snatch thefts from regular actions with high accuracy.In summary, this thesis proposes both supervised and unsupervised approaches for representing actions which provide better discrimination than existing representations. The first approach presents a dictionary learning based sparse representation for effective discrimination of actions. Also, we propose a sparse representation for the human body based on dictionaries in order to recognize actions with rapid body deformations. In the next approach, a low-dimensional representation called action-vector for unsupervised action recognition is presented. Further, linear and non-linear embedding of action-vectors is proposed for addressing inter-action similarity and fine-grained action recognition, respectively. Finally, we propose a representation for locating snatch thefts among thousands of regular interactions in surveillance videos

Research Archive of Indian Institute of Technology Hyderabad