8 research outputs found

    Learning Visual Attributes

    Get PDF
    We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as β€˜red’, β€˜striped’, or β€˜spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.

    A Generative Model for Parts-based Object Segmentation

    Get PDF
    The Shape Boltzmann Machine (SBM) [1] has recently been introduced as a stateof-the-art model of foreground/background object shape. We extend the SBM to account for the foreground object’s parts. Our new model, the Multinomial SBM (MSBM), can capture both local and global statistics of part shapes accurately. We combine the MSBM with an appearance model to form a fully generative model of images of objects. Parts-based object segmentations are obtained simply by performing probabilistic inference in the model. We apply the model to two challenging datasets which exhibit significant shape and appearance variability, and find that it obtains results that are comparable to the state-of-the-art. There has been significant focus in computer vision on object recognition and detection e.g. [2], but a strong desire remains to obtain richer descriptions of objects than just their bounding boxes. One such description is a parts-based object segmentation, in which an image is partitioned into multiple sets of pixels, each belonging to either a part of the object of interest, or its background. The significance of parts in computer vision has been recognized since the earliest days of th

    Leveraging Colour Segmentation for Upper-Body Detection

    Get PDF
    This paper presents an upper-body detection algorithm that extends classical shape-based detectors through the use of additional semantic colour segmentation cues. More precisely, candidate upper-body image patches produced by a base detector are soft-segmented using a multi-class probabilistic colour segmentation algorithm that leverages spatial as well as colour prior distributions for different semantic object regions (skin, hair, clothing, background). These multi-class soft segmentation maps are then classified as true or false upper-bodies. By further fusing the score of this latter classifier with the base detection score, the method shows a performance improvement on three different public datasets and using two different upper-body base detectors, demonstrating the complementarity of the contextual semantic colour segmentation and the base detector

    Person re-Identification over distributed spaces and time

    Get PDF
    PhDReplicating the human visual system and cognitive abilities that the brain uses to process the information it receives is an area of substantial scientific interest. With the prevalence of video surveillance cameras a portion of this scientific drive has been into providing useful automated counterparts to human operators. A prominent task in visual surveillance is that of matching people between disjoint camera views, or re-identification. This allows operators to locate people of interest, to track people across cameras and can be used as a precursory step to multi-camera activity analysis. However, due to the contrasting conditions between camera views and their effects on the appearance of people re-identification is a non-trivial task. This thesis proposes solutions for reducing the visual ambiguity in observations of people between camera views This thesis first looks at a method for mitigating the effects on the appearance of people under differing lighting conditions between camera views. This thesis builds on work modelling inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited training samples. Unlike previous methods that use a mean-based representation for a set of training samples, the cumulative nature of the CBTF retains colour information from underrepresented samples in the training set. Additionally, the bi-directionality of the mapping function is explored to try and maximise re-identification accuracy by ensuring samples are accurately mapped between cameras. Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing lighting conditions within a single camera. As the CBTF requires manually labelled training samples it is limited to static lighting conditions and is less effective if the lighting changes. This Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting change over time, or rely on camera transition time information to update. By utilising contextual information drawn from the background in each camera view, an estimation of the lighting change within a single camera can be made. This background lighting model allows the mapping of colour information back to the original training conditions and thus remove the need for 3 retraining. Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous methods use a score based on a direct distance measure of set features to form a correct/incorrect match result. Rather than offering an operator a single outcome, the ranking paradigm is to give the operator a ranked list of possible matches and allow them to make the final decision. By utilising a Support Vector Machine (SVM) ranking method, a weighting on the appearance features can be learned that capitalises on the fact that not all image features are equally important to re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues by separating the training samples into smaller subsets and boosting the trained models. Finally, the thesis looks at a practical application of the ranking paradigm in a real world application. The system encompasses both the re-identification stage and the precursory extraction and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined to extract relevant information from the video, while several combinations of matching techniques are combined with temporal priors to form a more comprehensive overall matching criteria. The effectiveness of the proposed approaches is tested on datasets obtained from a variety of challenging environments including offices, apartment buildings, airports and outdoor public spaces

    Count Data Modeling and Classification Using Statistical Hierarchical Approaches and Multi-topic Models

    Get PDF
    In this thesis, we propose and develop various statistical models to enhance and improve the efficiency of statistical modeling of count data in various applications. The major emphasis of the work is focused on developing hierarchical models. Various schemes of hierarchical structures are thus developed and analyzed in this work ranging from purely static hierarchies to dynamic models. The second part of the work concerns itself with the development of multitopic statistical models. It has been shown that these models provide more realistic modeling characteristics in comparison to mono topic models. We proceed with developing several multitopic models and we analyze their performance against benchmark models. We show that our proposed models in the majority of instances improve the modeling efficiency in comparison to some benchmark models, without drastically increasing the computational demands. In the last part of the work, we extend our proposed multitopic models to include online learning capability and again we show the relative superiority of our models in comparison to the benchmark models. Various real world applications such as object recognition, scene classification, text classification and action recognition, are used for analyzing the strengths and weaknesses of our proposed models

    Synthesizing and Editing Photo-realistic Visual Objects

    Get PDF
    In this thesis we investigate novel methods of synthesizing new images of a deformable visual object using a collection of images of the object. We investigate both parametric and non-parametric methods as well as a combination of the two methods for the problem of image synthesis. Our main focus are complex visual objects, specifically deformable objects and objects with varying numbers of visible parts. We first introduce sketch-driven image synthesis system, which allows the user to draw ellipses and outlines in order to sketch a rough shape of animals as a constraint to the synthesized image. This system interactively provides feedback in the form of ellipse and contour suggestions to the partial sketch of the user. The user's sketch guides the non-parametric synthesis algorithm that blends patches from two exemplar images in a coarse-to-fine fashion to create a final image. We evaluate the method and synthesized images through two user studies. Instead of non-parametric blending of patches, a parametric model of the appearance is more desirable as its appearance representation is shared between all images of the dataset. Hence, we propose Context-Conditioned Component Analysis, a probabilistic generative parametric model, which described images with a linear combination of basis functions. The basis functions are evaluated for each pixel using a context vector computed from the local shape information. We evaluate C-CCA qualitatively and quantitatively on inpainting, appearance transfer and reconstruction tasks. Drawing samples of C-CCA generates novel, globally-coherent images, which, unfortunately, lack high-frequency details due to dimensionality reduction and misalignment. We develop a non-parametric model that enhances the samples of C-CCA with locally-coherent, high-frequency details. The non-parametric model efficiently finds patches from the dataset that match the C-CCA sample and blends the patches together. We analyze the results of the combined method on the datasets of horse and elephant images

    Capturing image structure with probabilistic index maps

    No full text
    One of the major problems in modeling images for vision tasks is that images with very similar structure may locally have completely different appearance, e.g., images taken under different illumination conditions, or the images of pedestrians with different clothing. While there have been many successful attempts to address these problems in application-specific settings, we believe that underlying a large set of problems in vision is a representational deficiency of intensity-derived local measurements that are the basis of most efficient models. We argue that interesting structure in images is better captured when the image is defined as a matrix whose entries are discrete indices to a separate palette of possible intensities, colors or other features, much like the image representation often used to save on storage. In order to model the variability in images, we define an image class not by a single index map, but by a probability distribution over the index maps, which can be automatically estimated from the data, and which we call probabilistic index maps. The existing algorithms can be adapted to work with this representation, as we illustrate in this paper on the example of transformation-invariant clustering and background subtraction. Furthermore, the probabilistic index map representation leads to algorithms with computational costs proportional to either the size of the palette or the log of the size of the palette, making the cost of significantly increased invariance to non-structural changes quite bearable.
    corecore