1,890 research outputs found
Visual Human Tracking and Group Activity Analysis: A Video Mining System for Retail Marketing
Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from
video sequences. The problem of automated analysis of visual information in order to derive descriptors
of high level human activities has intrigued computer vision community for decades and is
considered to be largely unsolved. A part of this interest is derived from the vast range of applications
in which such a solution may be useful. We attempt to find efficient formulations of these tasks
as applied to the extracting customer behavior information in a retail marketing context. Based on
these formulations, we present a system that visually tracks customers in a retail store and performs
a number of activity analysis tasks based on the output from the tracker.
In tracking we introduce new techniques for pedestrian detection, initialization of the body
model and a formulation of the temporal tracking as a global trans-dimensional optimization problem.
Initial human detection is addressed by a novel method for head detection, which incorporates
the knowledge of the camera projection model.The initialization of the human body model is addressed
by newly developed shape and appearance descriptors. Temporal tracking of customer
trajectories is performed by employing a human body tracking system designed as a Bayesian
jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality
ambiguities as people are leaving and entering the scene.
Following the tracking, we developed a two-stage group activity formulation based upon the
ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions,
which combine to derive a distance metric used in further swarm clustering. This way, in the
first stage the shoppers that belong to the same group are identified by deterministically clustering
bodies to detect short term events and in the second stage events are post-processed to form clusters
of group activities with fuzzy memberships.
Quantitative analysis of the tracking subsystem shows an improvement over the state of the
art methods, if used under similar conditions. Finally, based on the output from the tracker, the
activity recognition procedure achieves over 80% correct shopper group detection, as validated by
the human generated ground truth results
Human Activity Recognition System Based-on Sequential Logic Circuits and Statistical Models
this research proposed the human activityrecognition system that described complete flow of processes fromlowest process (dealing with images) to highest process (recognizehuman activity). We proposed human action recognition thatmanage image sequence then recognize human action with simplehuman model by model-based recognition technique. Theexperimental result shows good accuracy which up to 93%correctly recognized. We proposed the human activity processwith 3 methods that consecutive improved. All of those methodscan use the result of action recognition as inputs. First method isFSM recognizer. The human model in Finite State Machine (FSM)recognizer can be modeled by rational condition that make it easyto understand and consume low computation cost but it hard todefine complex activity condition so it is unsuitable method forcomplex activity. The second recognizer applied Hidden MarkovModel (HMM) for activity modeling. The HMM recognizer candealing with much more complex activity and give fair recognitionrate. However, HMM recognizer is not involve feature prioritythat should has effect to accuracy so we proposed the thirdrecognizer that used graph similarity measurement for activitymodeling and activity classification. The third one, GraphSimilarity Measurement (GSM) recognizer involved featurepriority for recognition method then show better result thanHMM in most measurement. GSM recognizer has ~84% accuracyin average. FSM recognizer is suitable for simple activity with lowcomputation cost while HMM is suitable for much more complexactivity and use single feature for recognition process. However,HMM method may not give best result for the activity that usemultiple features. GSM is also suitable for complex activity and,furthermore, give better result than HMM for the activity thattrained from multiple features
A review on intelligent monitoring and activity interpretation
This survey paper provides a tour of the various monitoring and activity interpretation frameworks found in the literature. The needs of monitoring and interpretation systems are presented in relation to the area where they have been developed or applied. Their evolution is studied to better understand the characteristics of current systems. After this, the main features of monitoring and activity interpretation systems are defined.Este trabajo presenta una revisión de los marcos de trabajo para monitorización e interpretación de actividades presentes en la literatura. Dependiendo del área donde dichos marcos se han desarrollado o aplicado, se han identificado diferentes necesidades. Además, para comprender mejor las particularidades de los marcos de trabajo, esta revisión realiza un recorrido por su evolución histórica. Posteriormente, se definirían las principales características de los sistemas de monitorización e interpretación de actividades.This work was partially supported by Spanish Ministerio de Economía y Competitividad / FEDER under DPI2016-80894-R grant
Recommended from our members
Enhanced fuzzy finite state machine for human activity modelling and recognition
A challenging key aspect of modelling and recognising human activity is to design a model that can deal with the uncertainty in human behaviour. Several machine learning and deep learning techniques are employed to model the Activity of Daily Living (ADL) representing the human activity. This paper proposes an enhanced Fuzzy Finite State Machine (FFSM) model by combining the classical FFSM with Long Short-Term Memory (LSTM) neural network and Convolutional Neural Network (CNN). The learning capability in the LSTM and CNN allows the system to learn the relationship in the temporal human activity data and to identify the parameters of the rule-based system as building blocks of the FFSM through time steps in the learning mode. The learned parameters are then used for generating the fuzzy rules that govern the transitions between the system’s states representing activities. The proposed enhanced FFSMs were tested and evaluated using two different datasets; a real dataset collected by our research group and a public dataset collected from CASAS smart home project. Using LSTM-FFSM, the experimental results achieved 95.7% and 97.6% for the first dataset and the second dataset, respectively. Once CNN-FFSM was applied to both datasets, the obtained results were 94.2% and 99.3%, respectively
Deep Learning for Semantic Video Understanding
The field of computer vision has long strived to extract understanding from images and videos sequences. The recent flood of video data along with massive increments in computing power have provided the perfect environment to generate advanced research to extract intelligence from video data. Video data is ubiquitous, occurring in numerous everyday activities such as surveillance, traffic, movies, sports, etc. This massive amount of video needs to be analyzed and processed efficiently to extract semantic features towards video understanding. Such capabilities could benefit surveillance, video analytics and visually challenged people. While watching a long video, humans have the uncanny ability to bypass unnecessary information and concentrate on the important events. These key events can be used as a higher-level description or summary of a long video. Inspired by the human visual cortex, this research affords such abilities in computers using neural networks. Useful or interesting events are first extracted from a video and then deep learning methodologies are used to extract natural language summaries for each video sequence. Previous approaches of video description either have been domain specific or use a template based approach to fill detected objects such as verbs or actions to constitute a grammatically correct sentence. This work involves exploiting temporal contextual information for sentence generation while working on wide domain datasets. Current state-of- the-art video description methodologies are well suited for small video clips whereas this research can also be applied to long sequences of video.
This work proposes methods to generate visual summaries of long videos, and in addition proposes techniques to annotate and generate textual summaries of the videos using recurrent networks. End to end video summarization immensely depends on abstractive summarization of video descriptions. State-of- the-art neural language & attention joint models have been used to generate textual summaries. Interesting segments of long video are extracted based on image quality as well as cinematographic and consumer preference. This novel approach will be a stepping stone for a variety of innovative applications such as video retrieval, automatic summarization for visually impaired persons, automatic movie review generation, video question and answering systems
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Articulated human tracking and behavioural analysis in video sequences
Recently, there has been a dramatic growth of interest in the observation and tracking
of human subjects through video sequences. Arguably, the principal impetus has come
from the perceived demand for technological surveillance, however applications in entertainment,
intelligent domiciles and medicine are also increasing. This thesis examines
human articulated tracking and the classi cation of human movement, rst separately
and then as a sequential process.
First, this thesis considers the development and training of a 3D model of human body
structure and dynamics. To process video sequences, an observation model is also designed
with a multi-component likelihood based on edge, silhouette and colour. This is de ned on
the articulated limbs, and visible from a single or multiple cameras, each of which may be
calibrated from that sequence. Second, for behavioural analysis, we develop a methodology
in which actions and activities are described by semantic labels generated from a Movement
Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was
developed for human tracking that allows multi-level parameter search consistent with the
body structure. This tracker relies on the articulated motion prediction provided by the
MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to
generate a probabilistic activity description with action labels.
The implemented algorithms for tracking and behavioural analysis are tested extensively
and independently against ground truth on human tracking and surveillance
datasets. Dynamic models are shown to predict and generate synthetic motion, while
MCM recovers both periodic and non-periodic activities, de ned either on the whole body
or at the limb level. Tracking results are comparable with the state of the art, however
the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS
- …