1,052 research outputs found

    Visual scene recognition with biologically relevant generative models

    No full text
    This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem

    Automated Semantic Content Extraction from Images

    Get PDF
    In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image

    Expert Object Recognition in video

    Get PDF
    A recent computer vision technique for object classification in still images is the biologically-inspired Expert Object Recognition (EOR). This thesis adapts and extends the EOR approach for use with segmented video data. Properties of this data, such as segmentation masks and the visibility of an object over multiple frames, are exploited to decrease human supervision and increase accuracy. Several types of runtime learning are facilitated: class-level learning in which object types that are not included in the training set are given artificial classes; viewpoint-level learning in which novel views of training objects are associated with existing classes; and instance-level learning of images that are somewhat similar to training images. The architecture of EOR, consisting of feature extraction, clustering, and cluster-specific principal component analysis, is retained. However, the K-means clustering algorithm used in EOR is replaced in this system by an augmented version of Fuzzy K-means. This algorithm is incrementally run over the lifetime of the system, and automatically determines an appropriate number of partitions based on the data in memory and on a system parameter. In addition, the edge and line-based feature extraction of EOR is replaced with a global application of the principal component analysis, which increases accuracy when used with segmented video data. Classification output for the system consists of a multi-class hypothesis for each tracked object, from which a single-class hard hypothesis may be determined. The system, named VEOR (video expert object recognition), is designed for and tested with noisy, automatically segmented real-world data, consisting of both videos and still images of vehicle (car, pickup truck, and van) profiles

    Perceptual texture similarity estimation

    Get PDF
    This thesis evaluates the ability of computational features to estimate perceptual texture similarity. In the first part of this thesis, we conducted two evaluation experiments on the ability of 51 computational feature sets to estimate perceptual texture similarity using two differ-ent evaluation methods, namely, pair-of-pairs based and retrieval based evaluations. These experiments compared the computational features to two sets of human derived ground-truth data, both of which are higher resolution than those commonly used. The first was obtained by free-grouping and the second by pair-of-pairs experiments. Using these higher resolution data, we found that the feature sets do not perform well when compared to human judgements. Our analysis shows that these computational feature sets either (1) only exploit power spectrum information or (2) only compute higher order statistics (HoS) on, at most, small local neighbourhoods. In other words, they cannot capture aperiodic, long-range spatial relationships. As we hypothesise that these long-range interactions are important for the human perception of texture similarity we carried out two more pair-of-pairs ex-periments, the results of which indicate that long-range interactions do provide humans with important cues for the perception of texture similarity. In the second part of this thesis we develop new texture features that can encode such data. We first examine the importance of three different types of visual information for human perception of texture. Our results show that contours are the most critical type of information for human discrimination of textures. Finally, we report the development of a new set of contour-based features which performed well on the free-grouping data and outperformed the 51 feature sets and another contour type feature set with the pair-of-pairs data

    Processing boundary and region features for perception

    Get PDF
    A fundamental task for any visual system is the accurate detection of objects from background information, for example, defining fruit from foliage or a predator in a forest. This is commonly referred to as figure-ground segregation, which occurs when the visual system locates differences in visual features across an image, such as colour or texture. Combinations of feature contrast define an object from its surrounds, though the exact nature of that combination is still debated. Two processes are likely to contribute to object conspicuity, the pooling of features within an object's bounds relative to those in the background ('region' contrast) and detecting feature contrast at the boundary itself ('boundary' contrast). Investigations of the relative contributions of these two processes to perception have produced sometimes contradictory findings, some of which can be explained by the methodology adopted in those studies. For example, results from several studies adopting search-based methodologies have advocated nonlinear interaction of the boundary and region processes, whereas results from more subjective methods have indicated a linear combination. This thesis aims to compare search and subjective methodologies to determine how visual features (region and boundary) interact, highlight limitations of these metrics, and then unpack the contributions of boundary and region processes in greater detail. The first and second experiments investigated the relative contributions of boundary strength, regional orientation, and regional spatial frequency to object conspicuity. This was achieved via a comparison of search and subjective methodologies, which, as mentioned, have previously produced conflicting results in this domain. The results advocated a relatively strong contribution of boundary features compared to region-based features, and replicated the apparent incongruence between findings from search-based and subjective metrics. Results from the search task suggest nonlinear interaction and those from the subjective task suggest linear interaction. A unifying model that reconciles these seemingly contradicting findings (and those in the literature) is then presented, which considers the effect of metric sensitivity and performance ceilings in the paradigms employed. In light of the findings from the first and second experiments that suggest a stronger contribution of boundary information to object conspicuity, the third and fourth experiments investigated boundary features in more detail. Anecdotal reports from observers in the earlier experiments suggest that the conspicuity of boundaries is modulated by information in the background, regardless of boundary structure. As such, the relative contributions of boundary-background contrast and boundary composition were investigated using a novel stimulus generation technique that enables their effective isolation. A novel metric for boundary composition that correlates well with perception is also outlined. Results for those experiments suggested a significant contribution of both sources of boundary information, though advocate a critical role for boundary-background contrast. The final experiment explored the contribution of region-based information to object conspicuity in more detail, specifically how higher-order image structure, such as the components of complex texture, contribute to conspicuity. A state-of-the-art texture synthesis model, which reproduces textures via mechanisms that mimic processes in the human visual system, is evaluated respect to its perceptual applicability. Previous evaluations of this synthesis model are extended via a novel approach that enables the isolation of the model's parameters (which simulate physiological mechanisms) for independent examination. An alternative metric for the efficacy of the model is also presented
    • …
    corecore