3,729 research outputs found

    Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders

    Get PDF
    This paper presents an approach to categorize typical places in indoor environments using 3D scans provided by a laser range finder. Examples of such places are offices, laboratories, or kitchens. In our method, we combine the range and reflectance data from the laser scan for the final categorization of places. Range and reflectance images are transformed into histograms of local binary patterns and combined into a single feature vector. This vector is later classified using support vector machines. The results of the presented experiments demonstrate the capability of our technique to categorize indoor places with high accuracy. We also show that the combination of range and reflectance information improves the final categorization results in comparison with a single modality

    Learning Deep NBNN Representations for Robust Place Categorization

    Full text link
    This paper presents an approach for semantic place categorization using data obtained from RGB cameras. Previous studies on visual place recognition and classification have shown that, by considering features derived from pre-trained Convolutional Neural Networks (CNNs) in combination with part-based classification models, high recognition accuracy can be achieved, even in presence of occlusions and severe viewpoint changes. Inspired by these works, we propose to exploit local deep representations, representing images as set of regions applying a Na\"{i}ve Bayes Nearest Neighbor (NBNN) model for image classification. As opposed to previous methods where CNNs are merely used as feature extractors, our approach seamlessly integrates the NBNN model into a fully-convolutional neural network. Experimental results show that the proposed algorithm outperforms previous methods based on pre-trained CNN models and that, when employed in challenging robot place recognition tasks, it is robust to occlusions, environmental and sensor changes

    Robust Place Categorization With Deep Domain Generalization

    Get PDF
    Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination, and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have been proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases, this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper, we present an approach that aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g., corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a convolutional neural network architecture with novel layers performing a weighted version of batch normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution

    Pandora: Description of a Painting Database for Art Movement Recognition with Baselines and Perspectives

    Full text link
    To facilitate computer analysis of visual art, in the form of paintings, we introduce Pandora (Paintings Dataset for Recognizing the Art movement) database, a collection of digitized paintings labelled with respect to the artistic movement. Noting that the set of databases available as benchmarks for evaluation is highly reduced and most existing ones are limited in variability and number of images, we propose a novel large scale dataset of digital paintings. The database consists of more than 7700 images from 12 art movements. Each genre is illustrated by a number of images varying from 250 to nearly 1000. We investigate how local and global features and classification systems are able to recognize the art movement. Our experimental results suggest that accurate recognition is achievable by a combination of various categories.To facilitate computer analysis of visual art, in the form of paintings, we introduce Pandora (Paintings Dataset for Recognizing the Art movement) database, a collection of digitized paintings labelled with respect to the artistic movement. Noting that the set of databases available as benchmarks for evaluation is highly reduced and most existing ones are limited in variability and number of images, we propose a novel large scale dataset of digital paintings. The database consists of more than 7700 images from 12 art movements. Each genre is illustrated by a number of images varying from 250 to nearly 1000. We investigate how local and global features and classification systems are able to recognize the art movement. Our experimental results suggest that accurate recognition is achievable by a combination of various categories.Comment: 11 pages, 1 figure, 6 table

    STV-based Video Feature Processing for Action Recognition

    Get PDF
    In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end

    Improving Multi-view Facial Expression Recognition in Unconstrained Environments

    Get PDF
    Facial expression and emotion-related research has been a longstanding activity in psychology while computerized/automatic facial expression recognition of emotion is a relative recent and still emerging but active research area. Although many automatic computer systems have been proposed to address facial expression recognition problems, the majority of them fail to cope with the requirements of many practical application scenarios arising from either environmental factors or unexpected behavioural bias introduced by the users, such as illumination conditions and large head pose variation to the camera. In this thesis, two of the most influential and common issues raised in practical application scenarios when applying automatic facial expression recognition system are comprehensively explored and investigated. Through a series of experiments carried out under a proposed texture-based system framework for multi-view facial expression recognition, several novel texture feature representations are introduced for implementing multi-view facial expression recognition systems in practical environments, for which the state-of-the-art performance is achieved. In addition, a variety of novel categorization schemes for the configurations of an automatic multi-view facial expression recognition system is presented to address the impractical discrete categorization of facial expression of emotions in real-world scenarios. A significant improvement is observed when using the proposed categorizations in the proposed system framework using a novel implementation of the block based local ternary pattern approach
    • …
    corecore