Search CORE

22 research outputs found

Probabilistic Pose Recovery Using Learned Hierarchical Object Models

Author: D. Marr
G. Bouchard
J. Pearl
K. Fukushima
M.I. Jordan
N. Krüger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This paper presents a probabilistic representation for 3D objects, and details the mechanism of inferring the pose of real-world objects from vision. Our object model has the form of a hierarchy of increasingly expressive 3D features, and probabilistically represents 3D relations between these. Features at the bottom of the hierarchy are bound to local perceptions; while we currently only use visual features, our method can in principle incorporate features from diverse modalities within a coherent framework. Model instances are detected using a Nonparametric Belief Propagation algorithm which propagates evidence through the hierarchy to infer globally consistent poses for every feature of the model. Belief updates are managed by an importance-sampling mechanism that is critical for efficient and precise propagation. We conclude with a series of pose estimation experiments on real objects, along with quantitative performance evaluation

Crossref

Enlighten

Les attributs sous-tendant la reconnaissance d'objets visuels faits de deux composantes

Author: Lavoie Marie-Audrey
Publication venue
Publication date: 01/12/2021
Field of study

La perception de la forme visuelle est le principal médiateur de la reconnaissance d’objets. S’il y a consensus sur le fait que la détection des contours et l’analyse de fréquences spatiales sont les fondements de la vision primaire, la hiérarchie visuelle et les étapes subséquentes du traitement de l’information impliquées dans la reconnaissance d’objets sont quant à elles encore méconnues. Les données empiriques disponibles et pertinentes concernant la nature des traits primitifs qu’utilise véritablement le système visuel humain sont rares et aucune ne semble être entièrement concluante. Dans le but de palier à ce manque de données empiriques, la présente étude vise la découverte des régions de l’image utilisées par des participants humains lors d’une tâche de reconnaissance d’objets. La technique des bulles a permis de révéler les zones diagnostiques permettant de discriminer entre les huit cibles de l’étude. Les zones ayant un effet facilitateur et celles ayant un effet inhibiteur sur les performances humaines et celles d’un observateur idéal furent identifiées. Les participants n’ont pas employé la totalité de l’information disponible dans l’image, mais seulement une infime partie, ce sont principalement les segments de contours présentant une discontinuité (i.e. convexités, concavités, intersections) qui furent sélectionnés par ces derniers afin de reconnaitre les cibles. L’identification des objets semble reposer sur des ensembles de caractéristiques distinctives de l’objet qui lui permettent d’être différencié des autres. Les informations les plus simples et utiles ont préséance et lorsqu’elles suffisent à mener à bien la tâche, le système visuel ne semble pas appliquer de traitement plus complexe, par exemple, l’encodage de caractéristiques plus complexes ou encore de conjonctions d’attributs simples. Cela appuie la notion voulant que le contexte influence la sélection des caractéristiques sous-tendant la reconnaissance d’objets et suggère que le type d’attributs varie en fonction de leur utilité dans un contexte donné.The main mediator of visual object recognition is shape perception. While there is a consensus that contour detection and spatial frequency analysis are the foundations of early vision, the visual hierarchy and the nature of information processing in the subsequent stages involved in object recognition, remain widely unknown. Available and relevant empirical data concerning the nature of the primitive features used by the human visual system to recognize objects are scarce and none seems to be entirely conclusive. To overcome this lack of empirical data, this study aims to determine which regions of the images are used by humans when performing an object recognition task. The Bubbles technique has revealed the diagnostic areas used by 12 adults an ideal observer, to discriminate between eight target objects. stimulus areas with a facilitatory or inhibitory effect on performance were identified. Humans only used a small subset of the information available to recognize the targets which consisted mostly in discontinuous contour segments (i.e. convexities, concavities, intersections). Object recognition seems to rest upon contrasting sets of features which allow objects to be discriminated from one another. The simplest and most useful information seems to take precedence and it suffices to the task, the visual system does not engage in further processing involving for instance more complex features or the encoding of conjunctions of simple features. This implies that context influences the selection of features underlying human object recognition and suggests that attribute types can vary according to their utility in a given context

Dépôt Institutionnel Numérique

Maximum similarity based feature matching and adaptive multiple kernel learning for object recognition

Author: Zhang Ziming
Publication venue
Publication date: 01/01/2010
Field of study

In this thesis, we perform object recognition using (i) maximum similarity based feature matching, and (ii) adaptive multiple kernel learning. Images are likely more similar if they contain objects within the same categories, so how to measure image similarities correctly and efficiently is one of the critical issues for object recognition. We first propose to match features between two images so that their similarity is maximized, and employ support vector machines (SVMs) for recognition based on the maximum similarity matrix. Secondly, given several similarity matrices (kernels) created by different visual information in images, we propose a novel adaptive multiple kernel learning technique to generate an optimal kernel from all the kernels based on biconvex optimization. These two new approaches are tested on the most recent image benchmark datasets and their results are impressive, equalling or bettering the state-of-the-art results

Simon Fraser University Institutional Repository

Classifying imbalanced data sets using similarity based hierarchical decomposition

Author: Beyan Cigdem
Fisher Robert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Classification of data is difficult if the data is imbalanced and classes are overlapping. In recent years, more research has started to focus on classification of imbalanced data since real world data is often skewed. Traditional methods are more successful with classifying the class that has the most samples (majority class) compared to the other classes (minority classes). For the classification of imbalanced data sets, different methods are available, although each has some advantages and shortcomings. In this study, we propose a new hierarchical decomposition method for imbalanced data sets which is different from previously proposed solutions to the class imbalance problem. Additionally, it does not require any data pre-processing step as many other solutions need. The new method is based on clustering and outlier detection. The hierarchy is constructed using the similarity of labeled data subsets at each level of the hierarchy with different levels being built by different data and feature subsets. Clustering is used to partition the data while outlier detection is utilized to detect minority class samples. The comparison of the proposed method with state of art the methods using 20 public imbalanced data sets and 181 synthetic data sets showed that the proposed method׳s classification performance is better than the state of art methods. It is especially successful if the minority class is sparser than the majority class. It has accurate performance even when classes have sub-varieties and minority and majority classes are overlapping. Moreover, its performance is also good when the class imbalance ratio is low, i.e. classes are more imbalanced

Crossref

Edinburgh Research Explorer

Catalogo dei prodotti della ricerca

Recognizing simple human actions by exploiting regularities in pose sequences

Author: Otašević Nikola (Nikola B.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 95-97).The human visual system represents a very complex and important part of brain activity, occupying a very significant portion of the cortex resources. It enables us to see colors, detect motion, perceive dimensions and distance. It enables us to solve a very wide range of problems such as image segmentation, object tracking, as well as object and activity recognition. We perform these tasks so easily that we are not even aware of their enormous complexity. How do we do that? This question has motivated decades of research in the field of computer vision. In this thesis, I make a contribution toward solving the particular problem of visionbased human-action recognition by exploiting the compositional nature of simple actions such as running, walking or bending. Noting that simple actions consist of a series of atomic movements and can be represented as a structured sequence of poses, I designed and implemented a system that learns a model of actions based on human-pose classification from a single frame and from a model of transitions between poses through time. The system comprises three parts. The first part is the pose classifier that is capable of inferring a pose from a single frame. Its role is to take as input an image and give its best estimate of the pose in that image. The second part is a hidden Markov model of the transitions between poses. I exploit structural constraints in human motion to build a model that corrects some of the errors made by the independent single-frame pose classifier. Finally, in the third part, the corrected sequence of poses is used to recognize action based on the frequency of pose patterns, the transitions between the poses and hidden Markov models of individual actions. I demonstrate and test my system on the public KTH dataset, which contains examples of running, walking, jogging, boxing, handclapping, and handwaving, as well as on a new dataset, which contains examples of not only running and walking, but also jumping, crouching, crawling, kicking a ball, passing a basketball, and shooting a basketball. On these datasets, my system exhibits 91% action recognition recall rate.by Nikola Otasevic.M. Eng

DSpace@MIT

Automatic object classification for surveillance videos.

Author: Fernandez Arguedas Virginia
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2012
Field of study

PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

Queen Mary Research Online

Recommended from our members

Automatic Multilevel Feature Abstraction in Adaptable Machine Vision Systems

Author: Rose Valerie
Publication venue
Publication date: 01/01/2010
Field of study

Vision is a complex task which can be accomplished with apparent ease by biological systems, but for which the design of artificial systems is difficult. Although machine vision systems can be successfully designed for a specific task, under certain conditions, they are likely to fail if circumstances change. This was the motivation for the research into ways in which systems can be self-designing and adaptable to new visual tasks. The research was conducted in three vital areas of concern for machine vision systems. The first area is finding a suitable architecture for forming an appropriate representation for the current task. The research investigated the application of Hypernetworks theory to building a multilevel, generally-applicable representation, through repeated application of a fundamental 'self-similarity' principle, that parts of objects assembled under a particular relation at one level, form whole objects at the next. Results show that this is potentially a powerful approach for autonomously generating an adaptable system-architecture suitable for multiple visual tasks. The second area is the autonomous extraction of suitable low-level features, which the research investigated through random generation of minimally-constrained pixel-configurations and algorithmic generation of homogeneous and heterogeneous polygons. The results suggest that, despite the simplicity of the features making them vulnerable to image transformations, these are promising approaches worth developing further. The third area is automatic feature selection. The research explored management of 'dimensionality' and of 'combinatorial explosion', as well as how to locate relevant features at multiple representation levels, in the context of 'emergence' of structure. Results indicate that this approach can find useful 'intermediate-level' constructs through analysis of the connectivity of the simplices representing objects at higher levels. The research concludes that the proposed novel approaches to tackling the above issues, in particular the application of hypernetworks to the formation of multilevel representations and the resulting emergence of higher-level structure, is fruitful

Open Research Online (The Open University)