22 research outputs found

    Probabilistic Pose Recovery Using Learned Hierarchical Object Models

    Full text link
    This paper presents a probabilistic representation for 3D objects, and details the mechanism of inferring the pose of real-world objects from vision. Our object model has the form of a hierarchy of increasingly expressive 3D features, and probabilistically represents 3D relations between these. Features at the bottom of the hierarchy are bound to local perceptions; while we currently only use visual features, our method can in principle incorporate features from diverse modalities within a coherent framework. Model instances are detected using a Nonparametric Belief Propagation algorithm which propagates evidence through the hierarchy to infer globally consistent poses for every feature of the model. Belief updates are managed by an importance-sampling mechanism that is critical for efficient and precise propagation. We conclude with a series of pose estimation experiments on real objects, along with quantitative performance evaluation

    Les attributs sous-tendant la reconnaissance d'objets visuels faits de deux composantes

    Full text link
    La perception de la forme visuelle est le principal médiateur de la reconnaissance d’objets. S’il y a consensus sur le fait que la détection des contours et l’analyse de fréquences spatiales sont les fondements de la vision primaire, la hiérarchie visuelle et les étapes subséquentes du traitement de l’information impliquées dans la reconnaissance d’objets sont quant à elles encore méconnues. Les données empiriques disponibles et pertinentes concernant la nature des traits primitifs qu’utilise véritablement le système visuel humain sont rares et aucune ne semble être entièrement concluante. Dans le but de palier à ce manque de données empiriques, la présente étude vise la découverte des régions de l’image utilisées par des participants humains lors d’une tâche de reconnaissance d’objets. La technique des bulles a permis de révéler les zones diagnostiques permettant de discriminer entre les huit cibles de l’étude. Les zones ayant un effet facilitateur et celles ayant un effet inhibiteur sur les performances humaines et celles d’un observateur idéal furent identifiées. Les participants n’ont pas employé la totalité de l’information disponible dans l’image, mais seulement une infime partie, ce sont principalement les segments de contours présentant une discontinuité (i.e. convexités, concavités, intersections) qui furent sélectionnés par ces derniers afin de reconnaitre les cibles. L’identification des objets semble reposer sur des ensembles de caractéristiques distinctives de l’objet qui lui permettent d’être différencié des autres. Les informations les plus simples et utiles ont préséance et lorsqu’elles suffisent à mener à bien la tâche, le système visuel ne semble pas appliquer de traitement plus complexe, par exemple, l’encodage de caractéristiques plus complexes ou encore de conjonctions d’attributs simples. Cela appuie la notion voulant que le contexte influence la sélection des caractéristiques sous-tendant la reconnaissance d’objets et suggère que le type d’attributs varie en fonction de leur utilité dans un contexte donné.The main mediator of visual object recognition is shape perception. While there is a consensus that contour detection and spatial frequency analysis are the foundations of early vision, the visual hierarchy and the nature of information processing in the subsequent stages involved in object recognition, remain widely unknown. Available and relevant empirical data concerning the nature of the primitive features used by the human visual system to recognize objects are scarce and none seems to be entirely conclusive. To overcome this lack of empirical data, this study aims to determine which regions of the images are used by humans when performing an object recognition task. The Bubbles technique has revealed the diagnostic areas used by 12 adults an ideal observer, to discriminate between eight target objects. stimulus areas with a facilitatory or inhibitory effect on performance were identified. Humans only used a small subset of the information available to recognize the targets which consisted mostly in discontinuous contour segments (i.e. convexities, concavities, intersections). Object recognition seems to rest upon contrasting sets of features which allow objects to be discriminated from one another. The simplest and most useful information seems to take precedence and it suffices to the task, the visual system does not engage in further processing involving for instance more complex features or the encoding of conjunctions of simple features. This implies that context influences the selection of features underlying human object recognition and suggests that attribute types can vary according to their utility in a given context

    Maximum similarity based feature matching and adaptive multiple kernel learning for object recognition

    Get PDF
    In this thesis, we perform object recognition using (i) maximum similarity based feature matching, and (ii) adaptive multiple kernel learning. Images are likely more similar if they contain objects within the same categories, so how to measure image similarities correctly and efficiently is one of the critical issues for object recognition. We first propose to match features between two images so that their similarity is maximized, and employ support vector machines (SVMs) for recognition based on the maximum similarity matrix. Secondly, given several similarity matrices (kernels) created by different visual information in images, we propose a novel adaptive multiple kernel learning technique to generate an optimal kernel from all the kernels based on biconvex optimization. These two new approaches are tested on the most recent image benchmark datasets and their results are impressive, equalling or bettering the state-of-the-art results

    Classifying imbalanced data sets using similarity based hierarchical decomposition

    Get PDF
    Classification of data is difficult if the data is imbalanced and classes are overlapping. In recent years, more research has started to focus on classification of imbalanced data since real world data is often skewed. Traditional methods are more successful with classifying the class that has the most samples (majority class) compared to the other classes (minority classes). For the classification of imbalanced data sets, different methods are available, although each has some advantages and shortcomings. In this study, we propose a new hierarchical decomposition method for imbalanced data sets which is different from previously proposed solutions to the class imbalance problem. Additionally, it does not require any data pre-processing step as many other solutions need. The new method is based on clustering and outlier detection. The hierarchy is constructed using the similarity of labeled data subsets at each level of the hierarchy with different levels being built by different data and feature subsets. Clustering is used to partition the data while outlier detection is utilized to detect minority class samples. The comparison of the proposed method with state of art the methods using 20 public imbalanced data sets and 181 synthetic data sets showed that the proposed method׳s classification performance is better than the state of art methods. It is especially successful if the minority class is sparser than the majority class. It has accurate performance even when classes have sub-varieties and minority and majority classes are overlapping. Moreover, its performance is also good when the class imbalance ratio is low, i.e. classes are more imbalanced

    Recognizing simple human actions by exploiting regularities in pose sequences

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 95-97).The human visual system represents a very complex and important part of brain activity, occupying a very significant portion of the cortex resources. It enables us to see colors, detect motion, perceive dimensions and distance. It enables us to solve a very wide range of problems such as image segmentation, object tracking, as well as object and activity recognition. We perform these tasks so easily that we are not even aware of their enormous complexity. How do we do that? This question has motivated decades of research in the field of computer vision. In this thesis, I make a contribution toward solving the particular problem of visionbased human-action recognition by exploiting the compositional nature of simple actions such as running, walking or bending. Noting that simple actions consist of a series of atomic movements and can be represented as a structured sequence of poses, I designed and implemented a system that learns a model of actions based on human-pose classification from a single frame and from a model of transitions between poses through time. The system comprises three parts. The first part is the pose classifier that is capable of inferring a pose from a single frame. Its role is to take as input an image and give its best estimate of the pose in that image. The second part is a hidden Markov model of the transitions between poses. I exploit structural constraints in human motion to build a model that corrects some of the errors made by the independent single-frame pose classifier. Finally, in the third part, the corrected sequence of poses is used to recognize action based on the frequency of pose patterns, the transitions between the poses and hidden Markov models of individual actions. I demonstrate and test my system on the public KTH dataset, which contains examples of running, walking, jogging, boxing, handclapping, and handwaving, as well as on a new dataset, which contains examples of not only running and walking, but also jumping, crouching, crawling, kicking a ball, passing a basketball, and shooting a basketball. On these datasets, my system exhibits 91% action recognition recall rate.by Nikola Otasevic.M. Eng

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems
    corecore