22 research outputs found

    Semantically Selective Augmentation for Deep Compact Person Re-Identification

    Get PDF
    We present a deep person re-identification approach that combines semantically selective, deep data augmentation with clustering-based network compression to generate high performance, light and fast inference networks. In particular, we propose to augment limited training data via sampling from a deep convolutional generative adversarial network (DCGAN), whose discriminator is constrained by a semantic classifier to explicitly control the domain specificity of the generation process. Thereby, we encode information in the classifier network which can be utilized to steer adversarial synthesis, and which fuels our CondenseNet ID-network training. We provide a quantitative and qualitative analysis of the approach and its variants on a number of datasets, obtaining results that outperform the state-of-the-art on the LIMA dataset for long-term monitoring in indoor living spaces

    Agent-based framework for person re-identification

    Get PDF
    In computer based human object re-identification, a detected human is recognised to a level sufficient to re-identify a tracked person in either a different camera capturing the same individual, often at a different angle, or the same camera at a different time and/or the person approaching the camera at a different angle. Instead of relying on face recognition technology such systems study the clothing of the individuals being monitored and/or objects being carried to establish correspondence and hence re-identify the human object. Unfortunately present human-object re-identification systems consider the entire human object as one connected region in making the decisions about similarity of two objects being matched. This assumption has a major drawback in that when a person is partially occluded, a part of the occluding foreground will be picked up and used in matching. Our research revealed that when a human observer carries out a manual human-object re-identification task, the attention is often taken over by some parts of the human figure/body, more than the others, e.g. face, brightly colour shirt, presence of texture patterns in clothing etc., and occluding parts are ignored. In this thesis, a novel multi-agent based framework is proposed for the design of a human object re-identification system. Initially a HOG based feature extraction is used in a SVM based classification of a human object as a human of a full-body or of half body nature. Subsequently the relative visual significance of the top and the bottom parts of the human, in re-identification is quantified by the analysis of Gray Level Co-occurrence based texture features and colour histograms obtained in the HSV colour space. Accordingly different weights are assigned to the top and bottom of the human body using a novel probabilistic approach. The weights are then used to modify the Hybrid Spatiogram and Covariance Descriptor (HSCD) feature based re-identification algorithm adopted. A significant novelty of the human object re-identification systems proposed in this thesis is the agent based design procedure adopted that separates the use of computer vision algorithms for feature extraction, comparison etc., from the decision making process of re-identification. Multiple agents are assigned to execute different algorithmic tasks and the agents communicate to make the required logical decisions. Detailed experimental results are provided to prove that the proposed multi agent based framework for human object re-identification performs significantly better than the state of-the-art algorithms. Further it is shown that the design flexibilities and scalabilities of the proposed system allows it to be effectively utilised in more complex computer vision based video analytic/forensic tasks often conducted within distributed, multi-camera systems

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example

    Real-time person re-identification for interactive environments

    Get PDF
    The work presented in this thesis was motivated by a vision of the future in which intelligent environments in public spaces such as galleries and museums, deliver useful and personalised services to people via natural interaction, that is, without the need for people to provide explicit instructions via tangible interfaces. Delivering the right services to the right people requires a means of biometrically identifying individuals and then re-identifying them as they move freely through the environment. Delivering the service they desire requires sensing their context, for example, sensing their location or proximity to resources. This thesis presents both a context-aware system and a person re-identification method. A tabletop display was designed and prototyped with an infrared person-sensing context function. In experimental evaluation it exhibited tracking performance comparable to other more complex systems. A real-time, viewpoint invariant, person re-identification method is proposed based on a novel set of Viewpoint Invariant Multi-modal (ViMM) feature descriptors collected from depth-sensing cameras. The method uses colour and a combination of anthropometric properties logged as a function of body orientation. A neural network classifier is used to perform re-identification

    Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments

    Get PDF
    Traditionally, recognition systems were only based on human hard biometrics. However, the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from far distances, without people attendance in the acquisition process. Highresolution face closeshots are rarely available at far distances such that facebased systems cannot provide reliable results in surveillance applications. Human soft biometrics such as body and clothing attributes are believed to be more effective in analyzing human data collected by security cameras. This thesis contributes to the human soft biometric analysis in uncontrolled environments and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification (reid). We first review the literature of both tasks and highlight the history of advancements, recent developments, and the existing benchmarks. PAR and person reid difficulties are due to significant distances between intraclass samples, which originate from variations in several factors such as body pose, illumination, background, occlusion, and data resolution. Recent stateoftheart approaches present endtoend models that can extract discriminative and comprehensive feature representations from people. The correlation between different regions of the body and dealing with limited learning data is also the objective of many recent works. Moreover, class imbalance and correlation between human attributes are specific challenges associated with the PAR problem. We collect a large surveillance dataset to train a novel gender recognition model suitable for uncontrolled environments. We propose a deep residual network that extracts several posewise patches from samples and obtains a comprehensive feature representation. In the next step, we develop a model for multiple attribute recognition at once. Considering the correlation between human semantic attributes and class imbalance, we respectively use a multitask model and a weighted loss function. We also propose a multiplication layer on top of the backbone features extraction layers to exclude the background features from the final representation of samples and draw the attention of the model to the foreground area. We address the problem of person reid by implicitly defining the receptive fields of deep learning classification frameworks. The receptive fields of deep learning models determine the most significant regions of the input data for providing correct decisions. Therefore, we synthesize a set of learning data in which the destructive regions (e.g., background) in each pair of instances are interchanged. A segmentation module determines destructive and useful regions in each sample, and the label of synthesized instances are inherited from the sample that shared the useful regions in the synthesized image. The synthesized learning data are then used in the learning phase and help the model rapidly learn that the identity and background regions are not correlated. Meanwhile, the proposed solution could be seen as a data augmentation approach that fully preserves the label information and is compatible with other data augmentation techniques. When reid methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most importance in the final feature representation. Clothbased representations are not reliable in the longterm reid settings as people may change their clothes. Therefore, developing solutions that ignore clothing cues and focus on identityrelevant features are in demand. We transform the original data such that the identityrelevant information of people (e.g., face and body shape) are removed, while the identityunrelated cues (i.e., color and texture of clothes) remain unchanged. A learned model on the synthesized dataset predicts the identityunrelated cues (shortterm features). Therefore, we train a second model coupled with the first model and learns the embeddings of the original data such that the similarity between the embeddings of the original and synthesized data is minimized. This way, the second model predicts based on the identityrelated (longterm) representation of people. To evaluate the performance of the proposed models, we use PAR and person reid datasets, namely BIODI, PETA, RAP, Market1501, MSMTV2, PRCC, LTCC, and MIT and compared our experimental results with stateoftheart methods in the field. In conclusion, the data collected from surveillance cameras have low resolution, such that the extraction of hard biometric features is not possible, and facebased approaches produce poor results. In contrast, soft biometrics are robust to variations in data quality. So, we propose approaches both for PAR and person reid to learn discriminative features from each instance and evaluate our proposed solutions on several publicly available benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session

    Weakly Supervised Learning of Objects and Attributes.

    Get PDF
    PhDThis thesis presents weakly supervised learning approaches to directly exploit image-level tags (e.g. objects, attributes) for comprehensive image understanding, including tasks such as object localisation, image description, image retrieval, semantic segmentation, person re-identification and person search, etc. Unlike the conventional approaches which tackle weakly supervised problem by learning a discriminative model, a generative Bayesian framework is proposed which provides better mechanisms to resolve the ambiguity problem. The proposed model significantly differentiates from the existing approaches in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple objects co-existence so that “explaining away” inference can resolve ambiguity and lead to better learning. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out” objects of interest. (3) the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Detecting objects is the first and critical component in image understanding paradigm. Unlike conventional fully supervised object detection approaches, the proposed model aims to train an object detector from weakly labelled data. A novel framework based on Bayesian latent topic model is proposed to address the problem of localisation of objects as bounding boxes in images and videos with image level object labels. The inferred object location can be then used as the annotation to train a classic object detector with conventional approaches. However, objects cannot tell the whole story in an image. Beyond detecting objects, a general visual model should be able to describe objects and segment them at a pixel level. Another limitation of the initial model is that it still requires an additional object detector. To remedy the above two drawbacks, a novel weakly supervised non-parametric Bayesian model is presented to model objects, attributes and their associations automatically from weakly labelled images. Once learned, given a new image, the proposed model can describe the image with the combination of objects and attributes, as well as their locations and segmentation. Finally, this thesis further tackles the weakly supervised learning problem from a transfer learning perspective, by considering the fact that there are always some fully labelled or weakly labelled data available in a related domain while only insufficient labelled data exist for training in the target domain. A powerful semantic description is transferred from the existing fashion photography datasets to surveillance data to solve the person re-identification problem

    Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and Models

    Get PDF
    Person identification is a problem that has received substantial attention, particularly in security domains. Gait recognition is one of the most convenient approaches enabling person identification at a distance without the need of high-quality images. There are several review studies addressing person identification such as the utilization of facial images, silhouette images, and wearable sensor. Despite skeletonbased person identification gaining popularity while overcoming the challenges of traditional approaches, existing survey studies lack the comprehensive review of skeleton-based approaches to gait identification. We present a detailed review of the human pose estimation and gait analysis that make the skeleton-based approaches possible. The study covers various types of related datasets, tools, methodologies, and evaluation metrics with associated challenges, limitations, and application domains. Detailed comparisons are presented for each of these aspects with recommendations for potential research and alternatives. A common trend throughout this paper is the positive impact that deep learning techniques are beginning to have on topics such as human pose estimation and gait identification. The survey outcomes might be useful for the related research community and other stakeholders in terms of performance analysis of existing methodologies, potential research gaps, application domains, and possible contributions in the future

    Advances in Computer Recognition, Image Processing and Communications, Selected Papers from CORES 2021 and IP&C 2021

    Get PDF
    As almost all human activities have been moved online due to the pandemic, novel robust and efficient approaches and further research have been in higher demand in the field of computer science and telecommunication. Therefore, this (reprint) book contains 13 high-quality papers presenting advancements in theoretical and practical aspects of computer recognition, pattern recognition, image processing and machine learning (shallow and deep), including, in particular, novel implementations of these techniques in the areas of modern telecommunications and cybersecurity
    corecore