91 research outputs found

    Reconnaissance perceptuelle des objets d’IntĂ©rĂȘt : application Ă  l’interprétation des activités instrumentales de la vie quotidienne pour les Ă©tudes de démence

    Get PDF
    The rationale and motivation of this PhD thesis is in the diagnosis, assessment,maintenance and promotion of self-independence of people with dementia in their InstrumentalActivities of Daily Living (IADLs). In this context a strong focus is held towardsthe task of automatically recognizing IADLs. Egocentric video analysis (cameras worn by aperson) has recently gained much interest regarding this goal. Indeed recent studies havedemonstrated how crucial is the recognition of active objects (manipulated or observedby the person wearing the camera) for the activity recognition task and egocentric videospresent the advantage of holding a strong differentiation between active and passive objects(associated to background). One recent approach towards finding active elements in a sceneis the incorporation of visual saliency in the object recognition paradigms. Modeling theselective process of human perception of visual scenes represents an efficient way to drivethe scene analysis towards particular areas considered of interest or salient, which, in egocentricvideos, strongly corresponds to the locus of objects of interest. The objective of thisthesis is to design an object recognition system that relies on visual saliency-maps to providemore precise object representations, that are robust against background clutter and, therefore,improve the recognition of active object for the IADLs recognition task. This PhD thesisis conducted in the framework of the Dem@care European project.Regarding the vast field of visual saliency modeling, we investigate and propose a contributionin both Bottom-up (gaze driven by stimuli) and Top-down (gaze driven by semantics)areas that aim at enhancing the particular task of active object recognition in egocentricvideo content. Our first contribution on Bottom-up models originates from the fact thatobservers are attracted by a central stimulus (the center of an image). This biological phenomenonis known as central bias. In egocentric videos however this hypothesis does not alwayshold. We study saliency models with non-central bias geometrical cues. The proposedvisual saliency models are trained based on eye fixations of observers and incorporated intospatio-temporal saliency models. When compared to state of the art visual saliency models,the ones we present show promising results as they highlight the necessity of a non-centeredgeometric saliency cue. For our top-down model contribution we present a probabilisticvisual attention model for manipulated object recognition in egocentric video content. Althougharms often occlude objects and are usually considered as a burden for many visionsystems, they become an asset in our approach, as we extract both global and local featuresdescribing their geometric layout and pose, as well as the objects being manipulated. We integratethis information in a probabilistic generative model, provide update equations thatautomatically compute the model parameters optimizing the likelihood of the data, and designa method to generate maps of visual attention that are later used in an object-recognitionframework. This task-driven assessment reveals that the proposed method outperforms thestate-of-the-art in object recognition for egocentric video content. [...]Cette thĂšse est motivĂ©e par le diagnostic, l’évaluation, la maintenance et la promotion de l’indĂ©pendance des personnes souffrant de maladies dĂ©mentielles pour leurs activitĂ©s de la vie quotidienne. Dans ce contexte nous nous intĂ©ressons Ă  la reconnaissance automatique des activitĂ©s de la vie quotidienne.L’analyse des vidĂ©os de type Ă©gocentriques (oĂč la camĂ©ra est posĂ©e sur une personne) a rĂ©cemment gagnĂ© beaucoup d’intĂ©rĂȘt en faveur de cette tĂąche. En effet de rĂ©centes Ă©tudes dĂ©montrent l’importance cruciale de la reconnaissance des objets actifs (manipulĂ©s ou observĂ©s par le patient) pour la reconnaissance d’activitĂ©s et les vidĂ©os Ă©gocentriques prĂ©sentent l’avantage d’avoir une forte diffĂ©renciation entre les objets actifs et passifs (associĂ©s Ă  l’arriĂšre plan). Une des approches rĂ©centes envers la reconnaissance des Ă©lĂ©ments actifs dans une scĂšne est l’incorporation de la saillance visuelle dans les algorithmes de reconnaissance d’objets. ModĂ©liser le processus sĂ©lectif du systĂšme visuel humain reprĂ©sente un moyen efficace de focaliser l’analyse d’une scĂšne vers les endroits considĂ©rĂ©s d’intĂ©rĂȘts ou saillants,qui, dans les vidĂ©os Ă©gocentriques, correspondent fortement aux emplacements des objets d’intĂ©rĂȘt. L’objectif de cette thĂšse est de permettre au systĂšmes de reconnaissance d’objets de fournir une dĂ©tection plus prĂ©cise des objets d’intĂ©rĂȘts grĂące Ă  la saillance visuelle afin d’amĂ©liorer les performances de reconnaissances d’activitĂ©s de la vie de tous les jours. Cette thĂšse est menĂ©e dans le cadre du projet EuropĂ©en [email protected] le vaste domaine de la modĂ©lisation de la saillance visuelle, nous Ă©tudions et proposons une contribution Ă  la fois dans le domaine "Bottom-up" (regard attirĂ© par des stimuli) que dans le domaine "Top-down" (regard attirĂ© par la sĂ©mantique) qui ont pour but d’amĂ©liorer la reconnaissance d’objets actifs dans les vidĂ©os Ă©gocentriques. Notre premiĂšre contribution pour les modĂšles Bottom-up prend racine du fait que les observateurs d’une vidĂ©o sont normalement attirĂ©s par le centre de celle-ci. Ce phĂ©nomĂšne biologique s’appelle le biais central. Dans les vidĂ©os Ă©gocentriques cependant, cette hypothĂšse n’est plus valable.Nous proposons et Ă©tudions des modĂšles de saillance basĂ©s sur ce phĂ©nomĂšne de biais non central.Les modĂšles proposĂ©s sont entrainĂ©s Ă  partir de fixations d’oeil enregistrĂ©es et incorporĂ©es dans des modĂšles spatio-temporels. Lorsque comparĂ©s Ă  l’état-de-l’art des modĂšles Bottom-up, ceux que nous prĂ©sentons montrent des rĂ©sultats prometteurs qui illustrent la nĂ©cessitĂ© d’un modĂšle gĂ©omĂ©trique biaisĂ© non-centrĂ© dans ce type de vidĂ©os. Pour notre contribution dans le domaine Top-down, nous prĂ©sentons un modĂšle probabiliste d’attention visuelle pour la reconnaissance d’objets manipulĂ©s dans les vidĂ©os Ă©gocentriques. Bien que les bras soient souvent source d’occlusion des objets et considĂ©rĂ©s comme un fardeau, ils deviennent un atout dans notre approche. En effet nous extrayons Ă  la fois des caractĂ©ristiques globales et locales permettant d’estimer leur disposition gĂ©omĂ©trique. Nous intĂ©grons cette information dans un modĂšle probabiliste, avec Ă©quations de mise a jour pour optimiser la vraisemblance du modĂšle en fonction de ses paramĂštres et enfin gĂ©nĂ©rons les cartes d’attention visuelle pour la reconnaissance d’objets manipulĂ©s. [...

    Using Surfaces and Surface Relations in an Early Cognitive Vision System

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s00138-015-0705-yWe present a deep hierarchical visual system with two parallel hierarchies for edge and surface information. In the two hierarchies, complementary visual information is represented on different levels of granularity together with the associated uncertainties and confidences. At all levels, geometric and appearance information is coded explicitly in 2D and 3D allowing to access this information separately and to link between the different levels. We demonstrate the advantages of such hierarchies in three applications covering grasping, viewpoint independent object representation, and pose estimation.European Community’s Seventh Framework Programme FP7/IC

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    An Analytic Training Approach for Recognition in Still Images and Videos

    Get PDF
    This dissertation proposes a general framework to efficiently identify the objects of interest (OI) in still images and its application can be further extended to human action recognition in videos. The frameworks utilized in this research to process still images and videos are similar in architecture except they have different content representations. Initially, global level analysis is employed to extract distinctive feature sets from an input data. For the global analysis of data the bidirectional two dimensional principal component analysis (2D-PCA) is employed to preserve correlation amongst neighborhood pixels. Furthermore, to cope with the inherent limitations within the holistic approach local information is introduced into the framework. The local information of OI is identified utilizing FERNS and affine SIFT (ASIFT) approaches for spatial and temporal datasets, respectively. For supportive local information, the feature detection is followed by an effective pruning strategy to divide these features into inliers and outliers. A cluster of inliers represents local features which exhibit stable behavior and geometric consistency. Incremental learning is a significant but often overlooked problem in action recognition. The final part of this dissertation proposes a new action recognition algorithm based on sequential learning and adaptive representation of the human body using Pyramid of Histogram of Oriented Gradients (PHOG) features. The changing shape and appearance of human body parts is tracked based on the weak appearance constancy assumption. The constantly changing shape of an OI is maximally covered by the small blocks to approximate the body contour of a segmented foreground object. In addition, the analytically determined learning phase guarantees lower computational burden for classification. The utilization of a minimum number of video frames in a causal way to recognize an action is also explored in this dissertation. The use of PHOG features adaptively extracted from individual frames allows the recognition of an incoming action video using a small group of frames which eliminates the need of large look-ahead
    • 

    corecore