130 research outputs found
Multi-Modality Human Action Recognition
Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model
Robust density modelling using the student's t-distribution for human action recognition
The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
Visual tracking with online assessment and improved sampling strategy
The kernelized correlation filter (KCF) is one of the most successful trackers in computer vision today. However its performance may be significantly degraded in a wide range of challenging conditions such as occlusion and out of view. For many applications, particularly safety critical applications (e.g. autonomous driving), it is of profound importance to have consistent and reliable performance during all the operation conditions. This paper addresses this issue of the KCF based trackers by the introduction of two novel modules, namely online assessment of response map, and a strategy of combining cyclically shifted sampling with random sampling in deep feature space. A method of online assessment of response map is proposed to evaluate the tracking performance by constructing a 2-D Gaussian estimation model. Then a strategy of combining cyclically shifted sampling with random sampling in deep feature space is presented to improve the tracking performance when the tracking performance is assessed to be unreliable based on the response map. Therefore, the module of online assessment can be regarded as the trigger for the second module. Experiments verify the tracking performance is significantly improved particularly in challenging conditions as demonstrated by both quantitative and qualitative comparisons of the proposed tracking algorithm with the state-of-the-art tracking algorithms on OTB-2013 and OTB-2015 datasets
Reconnaissance perceptuelle des objets dâIntĂ©rĂȘt : application Ă lâinterpreÌtation des activiteÌs instrumentales de la vie quotidienne pour les Ă©tudes de deÌmence
The rationale and motivation of this PhD thesis is in the diagnosis, assessment,maintenance and promotion of self-independence of people with dementia in their InstrumentalActivities of Daily Living (IADLs). In this context a strong focus is held towardsthe task of automatically recognizing IADLs. Egocentric video analysis (cameras worn by aperson) has recently gained much interest regarding this goal. Indeed recent studies havedemonstrated how crucial is the recognition of active objects (manipulated or observedby the person wearing the camera) for the activity recognition task and egocentric videospresent the advantage of holding a strong differentiation between active and passive objects(associated to background). One recent approach towards finding active elements in a sceneis the incorporation of visual saliency in the object recognition paradigms. Modeling theselective process of human perception of visual scenes represents an efficient way to drivethe scene analysis towards particular areas considered of interest or salient, which, in egocentricvideos, strongly corresponds to the locus of objects of interest. The objective of thisthesis is to design an object recognition system that relies on visual saliency-maps to providemore precise object representations, that are robust against background clutter and, therefore,improve the recognition of active object for the IADLs recognition task. This PhD thesisis conducted in the framework of the Dem@care European project.Regarding the vast field of visual saliency modeling, we investigate and propose a contributionin both Bottom-up (gaze driven by stimuli) and Top-down (gaze driven by semantics)areas that aim at enhancing the particular task of active object recognition in egocentricvideo content. Our first contribution on Bottom-up models originates from the fact thatobservers are attracted by a central stimulus (the center of an image). This biological phenomenonis known as central bias. In egocentric videos however this hypothesis does not alwayshold. We study saliency models with non-central bias geometrical cues. The proposedvisual saliency models are trained based on eye fixations of observers and incorporated intospatio-temporal saliency models. When compared to state of the art visual saliency models,the ones we present show promising results as they highlight the necessity of a non-centeredgeometric saliency cue. For our top-down model contribution we present a probabilisticvisual attention model for manipulated object recognition in egocentric video content. Althougharms often occlude objects and are usually considered as a burden for many visionsystems, they become an asset in our approach, as we extract both global and local featuresdescribing their geometric layout and pose, as well as the objects being manipulated. We integratethis information in a probabilistic generative model, provide update equations thatautomatically compute the model parameters optimizing the likelihood of the data, and designa method to generate maps of visual attention that are later used in an object-recognitionframework. This task-driven assessment reveals that the proposed method outperforms thestate-of-the-art in object recognition for egocentric video content. [...]Cette thĂšse est motivĂ©e par le diagnostic, lâĂ©valuation, la maintenance et la promotion de lâindĂ©pendance des personnes souffrant de maladies dĂ©mentielles pour leurs activitĂ©s de la vie quotidienne. Dans ce contexte nous nous intĂ©ressons Ă la reconnaissance automatique des activitĂ©s de la vie quotidienne.Lâanalyse des vidĂ©os de type Ă©gocentriques (oĂč la camĂ©ra est posĂ©e sur une personne) a rĂ©cemment gagnĂ© beaucoup dâintĂ©rĂȘt en faveur de cette tĂąche. En effet de rĂ©centes Ă©tudes dĂ©montrent lâimportance cruciale de la reconnaissance des objets actifs (manipulĂ©s ou observĂ©s par le patient) pour la reconnaissance dâactivitĂ©s et les vidĂ©os Ă©gocentriques prĂ©sentent lâavantage dâavoir une forte diffĂ©renciation entre les objets actifs et passifs (associĂ©s Ă lâarriĂšre plan). Une des approches rĂ©centes envers la reconnaissance des Ă©lĂ©ments actifs dans une scĂšne est lâincorporation de la saillance visuelle dans les algorithmes de reconnaissance dâobjets. ModĂ©liser le processus sĂ©lectif du systĂšme visuel humain reprĂ©sente un moyen efficace de focaliser lâanalyse dâune scĂšne vers les endroits considĂ©rĂ©s dâintĂ©rĂȘts ou saillants,qui, dans les vidĂ©os Ă©gocentriques, correspondent fortement aux emplacements des objets dâintĂ©rĂȘt. Lâobjectif de cette thĂšse est de permettre au systĂšmes de reconnaissance dâobjets de fournir une dĂ©tection plus prĂ©cise des objets dâintĂ©rĂȘts grĂące Ă la saillance visuelle afin dâamĂ©liorer les performances de reconnaissances dâactivitĂ©s de la vie de tous les jours. Cette thĂšse est menĂ©e dans le cadre du projet EuropĂ©en [email protected] le vaste domaine de la modĂ©lisation de la saillance visuelle, nous Ă©tudions et proposons une contribution Ă la fois dans le domaine "Bottom-up" (regard attirĂ© par des stimuli) que dans le domaine "Top-down" (regard attirĂ© par la sĂ©mantique) qui ont pour but dâamĂ©liorer la reconnaissance dâobjets actifs dans les vidĂ©os Ă©gocentriques. Notre premiĂšre contribution pour les modĂšles Bottom-up prend racine du fait que les observateurs dâune vidĂ©o sont normalement attirĂ©s par le centre de celle-ci. Ce phĂ©nomĂšne biologique sâappelle le biais central. Dans les vidĂ©os Ă©gocentriques cependant, cette hypothĂšse nâest plus valable.Nous proposons et Ă©tudions des modĂšles de saillance basĂ©s sur ce phĂ©nomĂšne de biais non central.Les modĂšles proposĂ©s sont entrainĂ©s Ă partir de fixations dâoeil enregistrĂ©es et incorporĂ©es dans des modĂšles spatio-temporels. Lorsque comparĂ©s Ă lâĂ©tat-de-lâart des modĂšles Bottom-up, ceux que nous prĂ©sentons montrent des rĂ©sultats prometteurs qui illustrent la nĂ©cessitĂ© dâun modĂšle gĂ©omĂ©trique biaisĂ© non-centrĂ© dans ce type de vidĂ©os. Pour notre contribution dans le domaine Top-down, nous prĂ©sentons un modĂšle probabiliste dâattention visuelle pour la reconnaissance dâobjets manipulĂ©s dans les vidĂ©os Ă©gocentriques. Bien que les bras soient souvent source dâocclusion des objets et considĂ©rĂ©s comme un fardeau, ils deviennent un atout dans notre approche. En effet nous extrayons Ă la fois des caractĂ©ristiques globales et locales permettant dâestimer leur disposition gĂ©omĂ©trique. Nous intĂ©grons cette information dans un modĂšle probabiliste, avec Ă©quations de mise a jour pour optimiser la vraisemblance du modĂšle en fonction de ses paramĂštres et enfin gĂ©nĂ©rons les cartes dâattention visuelle pour la reconnaissance dâobjets manipulĂ©s. [...
Advances in Image Processing, Analysis and Recognition Technology
For many decades, researchers have been trying to make computersâ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
Efficient resource allocation for automotive active vision systems
Individual mobility on roads has a noticeable impact upon peoples' lives, including
traffic accidents resulting in severe, or even lethal injuries. Therefore the main goal when
operating a vehicle is to safely participate in road-traffic while minimising the adverse
effects on our environment. This goal is pursued by road safety measures ranging from
safety-oriented road design to driver assistance systems. The latter require exteroceptive
sensors to acquire information about the vehicle's current environment.
In this thesis an efficient resource allocation for automotive vision systems is proposed.
The notion of allocating resources implies the presence of processes that observe the whole
environment and that are able to effeciently direct attentive processes. Directing attention
constitutes a decision making process dependent upon the environment it operates in, the
goal it pursues, and the sensor resources and computational resources it allocates. The
sensor resources considered in this thesis are a subset of the multi-modal sensor system on
a test vehicle provided by Audi AG, which is also used to evaluate our proposed resource
allocation system.
This thesis presents an original contribution in three respects. First, a system architecture
designed to efficiently allocate both high-resolution sensor resources and computational
expensive processes based upon low-resolution sensor data is proposed. Second,
a novel method to estimate 3-D range motion, e cient scan-patterns for spin image based
classifiers, and an evaluation of track-to-track fusion algorithms present contributions in
the field of data processing methods. Third, a Pareto efficient multi-objective resource
allocation method is formalised, implemented, and evaluated using road traffic test sequences
- âŠ