95 research outputs found

    Real-time human body detection and tracking for augmented reality mobile applications

    Get PDF
    Hoje em dia, cada vez mais experiências culturais são melhoradas tendo por base aplicações móveis, incluindo aqueles que usam Realidade Aumentada (RA). Estas aplicações têm crescido em número de utilizadores, em muito suportadas no aumento do poder de cálculo dos processadores mais recentes, na popularidade dos dispositivos móveis (com câmaras de alta definição e sistemas de posicionamento global – GPS), e na massificação da disponibilidade de conexões de internet. Tendo este contexto em mente, o projeto Mobile Five Senses Augmented Reality System for Museums (M5SAR) visa desenvolver um sistema de RA para ser um guia em eventos culturais, históricos e em museus, complementando ou substituindo a orientação tradicional dada pelos guias ou mapas. O trabalho descrito na presente tese faz parte do projeto M5SAR. O sistema completo consiste numa aplicação para dispositivos móveis e num dispositivo físico, a acoplar ao dispositivo móvel, que em conjunto visam explorar os 5 sentidos humanos: visão, audição, tato, olfacto e paladar. O projeto M5SAR tem como objetivos principais (a) detectar peças do museu (por exemplo, pinturas e estátuas (Pereira et al., 2017)), (b) detectar paredes / ambientes do museu (Veiga et al., 2017) e (c) detectar formas humanas para sobrepor o conteúdo de Realidade Aumentada (?). Esta tese apresenta uma abordagem relativamente ao último objectivo, combinando informações de articulações do corpo humano com métodos de sobreposição de roupas. Os atuais sistemas relacionados com a sobreposição de roupas, que permitem ao utilizador mover-se livremente, são baseados em sensores tridimensionais (3D), e.g., Sensor Kinect (Erra et al., 2018), sendo estes não portáteis. A contribuição desta tese é apresentar uma solução portátil baseado na câmara (RGB) do telemóvel que permite ao utilizador movimentar-se livremente, fazendo ao mesmo tempo a sobreposição de roupa (para o corpo completo). Nos últimos anos, a capacidade de Redes Neurais Convolucionais (CNN) foi comprovado numa grande variedade de tarefas de visão computacional, tais como classificação e detecção de objetos e no reconhecimento de faces e texto (Amos et al., 2016; Ren et al., 2015a). Uma das áreas de uso das CNN é a estimativa de posição (pose) humana em ambientes reais (Insafutdinov et al., 2017; Pishchulin et al., 2016). Recentemente, duas populares CNN frameworks para detecção e segmentação de formas humanas apresentam destaque, o OpenPose (Cao et al., 2017;Wei et al., 2016) e o Mask R-CNN (He et al., 2017). No entanto, testes experimentais mostraram que as implementações originais não são adequadas para dispositivos móveis. Apesar disso, estas frameworks são a base para as implementações mais recentes, que possibilitam o uso em dispositivos móveis. Uma abordagem que alcança a estimativa e a segmentação de pose de corpo inteiro é o Mask R-CNN2Go (Jindal, 2018), baseado na estrutura original do Mask R-CNN. A principal razão para o tempo de processamento ser reduzido foi a otimização do número de camadas de convolução e a largura de cada camada. Outra abordagem para obter a estimativa de pose humana em dispositivos móveis foi a modificação da arquitetura original do OpenPose para mobile (Kim, 2018; Solano, 2018) e sua combinação com MobileNets (Howard et al., 2017). MobileNets, como o nome sugere, é projetado para aplicativos móveis, fazendo uso de camadas de convoluções separáveis em profundidade. Essa modificação reduz o tempo de processamento, mas também reduz a precisão na estimativa da pose, quando comparado à arquitetura original. É importante ressaltar que apesar de a detecção de pessoas com a sobreposição de roupas ser um tema atual, já existem aplicações disponíveis no mercado, como o Pozus (GENTLEMINDS, 2018). O Pozus é disponibilizado numa versão beta que é executado no sistema operativo iOS, usa a câmera do telemóvel como entrada para a estimação da pose humana aplicando segmentos de texturas sobre o corpo humano. No entanto, Pozus não faz ajuste de texturas (roupas) à forma da pessoa. Na presente tese, o modelo OpenPose foi usado para determinar as articulações do corpo e diferentes abordagens foram usadas para sobreposição de roupas, enquanto uma pessoa se move em ambientes reais. A primeira abordagem utiliza o algoritmo GrabCut (Rother et al., 2004) para segmentação de pessoas, permitindo o ajuste de segmentos de roupas. Uma segunda abordagem usa uma ferramenta bidimensional (2D) de Animação do Esqueleto para permitir deformações em texturas 2D de acordo com as poses estimadas. A terceira abordagem é semelhante à anterior, mas usa modelos 3D, volumes, para obter uma simulação mais realista do processo de sobreposição de roupas. Os resultados e a prova de conceito são mostrados. Os resultados são coerentes com uma prova de conceito. Os testes revelaram que como trabalho futuro as otimizações para melhorar a precisão do modelo de estimação da pose e o tempo de execução ainda são necessárias para dispositivos móveis. O método final utilizado para sobrepor roupas no corpo demonstrou resultados positivos, pois possibilitaram uma simulação mais realística do processo de sobreposição de roupas.When it comes to visitors at museums and heritage places, objects speak for themselves. Nevertheless, it is important to give visitors the best experience possible, this will lead to an increase in the visits number and enhance the perception and value of the organization. With the aim of enhancing a traditional museum visit, a mobile Augmented Reality (AR) framework is being developed as part of the Mobile Five Senses Augmented Reality (M5SAR) project. This thesis presents an initial approach to human shape detection and AR content superimposition in a mobile environment, achieved by combining information of human body joints with clothes overlapping methods. The present existing systems related to clothes overlapping, that allow the user to move freely, are based mainly in three-dimensional (3D) sensors (e.g., Kinect sensor (Erra et al., 2018)), making them far from being portable. The contribution of this thesis is to present a portable system that allows the user to move freely and does full body clothes overlapping. The OpenPose model (Kim, 2018; Solano, 2018) was used to compute the body joints and different approaches were used for clothes overlapping, while a person is moving in real environments. The first approach uses GrabCut algorithm (Rother et al., 2004) for person segmentation, allowing to fit clothes segments. A second approach uses a bi-dimensional (2D) skeletal animation tool to allow deformations on 2D textures according to the estimated poses. The third approach is similar to the previous, but uses 3D clothes models (volumes) to achieve a more realistic simulation of the process of clothes superimposition. Results and proof-of-concept are shown

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

    Stereo Pictorial Structure for 2D Articulated Human Pose Estimation

    Get PDF
    In this paper, we consider the problem of 2D human pose estimation on stereo image pairs. In particular, we aim at estimating the location, orientation and scale of upper-body parts of people detected in stereo image pairs from realistic stereo videos that can be found in the Internet. To address this task, we propose a novel pictorial structure model to exploit the stereo information included in such stereo image pairs: the Stereo Pictorial Structure (SPS). To validate our proposed model, we contribute a new annotated dataset of stereo image pairs, the Stereo Human Pose Estimation Dataset (SHPED), obtained from YouTube stereoscopic video sequences, depicting people in challenging poses and diverse indoor and outdoor scenarios. The experimental results on SHPED indicates that SPS improves on state-ofthe- art monocular models thanks to the appropriate use of the stereo informatio

    Object detection, recognition and re-identification in video footage

    Get PDF
    There has been a significant number of security concerns in recent times; as a result, security cameras have been installed to monitor activities and to prevent crimes in most public places. These analysis are done either through video analytic or forensic analysis operations on human observations. To this end, within the research context of this thesis, a proactive machine vision based military recognition system has been developed to help monitor activities in the military environment. The proposed object detection, recognition and re-identification systems have been presented in this thesis. A novel technique for military personnel recognition is presented in this thesis. Initially the detected camouflaged personnel are segmented using a grabcut segmentation algorithm. Since in general a camouflaged personnel's uniform appears to be similar both at the top and the bottom of the body, an image patch is initially extracted from the segmented foreground image and used as the region of interest. Subsequently the colour and texture features are extracted from each patch and used for classification. A second approach for personnel recognition is proposed through the recognition of the badge on the cap of a military person. A feature matching metric based on the extracted Speed Up Robust Features (SURF) from the badge on a personnel's cap enabled the recognition of the personnel's arm of service. A state-of-the-art technique for recognising vehicle types irrespective of their view angle is also presented in this thesis. Vehicles are initially detected and segmented using a Gaussian Mixture Model (GMM) based foreground/background segmentation algorithm. A Canny Edge Detection (CED) stage, followed by morphological operations are used as pre-processing stage to help enhance foreground vehicular object detection and segmentation. Subsequently, Region, Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) features are extracted from the refined foreground vehicle object and used as features for vehicle type recognition. Two different datasets with variant views of front/rear and angle are used and combined for testing the proposed technique. For night-time video analytics and forensics, the thesis presents a novel approach to pedestrian detection and vehicle type recognition. A novel feature acquisition technique named, CENTROG, is proposed for pedestrian detection and vehicle type recognition in this thesis. Thermal images containing pedestrians and vehicular objects are used to analyse the performance of the proposed algorithms. The video is initially segmented using a GMM based foreground object segmentation algorithm. A CED based pre-processing step is used to enhance segmentation accuracy prior using Census Transforms for initial feature extraction. HOG features are then extracted from the Census transformed images and used for detection and recognition respectively of human and vehicular objects in thermal images. Finally, a novel technique for people re-identification is proposed in this thesis based on using low-level colour features and mid-level attributes. The low-level colour histogram bin values were normalised to 0 and 1. A publicly available dataset (VIPeR) and a self constructed dataset have been used in the experiments conducted with 7 clothing attributes and low-level colour histogram features. These 7 attributes are detected using features extracted from 5 different regions of a detected human object using an SVM classifier. The low-level colour features were extracted from the regions of a detected human object. These 5 regions are obtained by human object segmentation and subsequent body part sub-division. People are re-identified by computing the Euclidean distance between a probe and the gallery image sets. The experiments conducted using SVM classifier and Euclidean distance has proven that the proposed techniques attained all of the aforementioned goals. The colour and texture features proposed for camouflage military personnel recognition surpasses the state-of-the-art methods. Similarly, experiments prove that combining features performed best when recognising vehicles in different views subsequent to initial training based on multi-views. In the same vein, the proposed CENTROG technique performed better than the state-of-the-art CENTRIST technique for both pedestrian detection and vehicle type recognition at night-time using thermal images. Finally, we show that the proposed 7 mid-level attributes and the low-level features results in improved performance accuracy for people re-identification

    A study of a clothing image segmentation method in complex conditions using a features fusion model

    Get PDF
    According to a priori knowledge in complex conditions, this paper proposes an unsupervised image segmentation algorithm to be used for clothing images that combines colour and texture features. First, block truncation encoding is used to divide the traditional three-dimensional colour space into a six-dimensional colour space so that more fine colour features can be obtained. Then, a texture feature based on the improved local binary pattern (LBP) algorithm is designed and used to describe the clothing image with the colour features. After that, according to the statistical appearance law of the object region and background information in the clothing image, a bisection method is proposed for the segmentation operation. Since the image is divided into several subimage blocks, bisection image segmentation will be accomplished more efficiently. The experimental results show that the proposed algorithm can quickly and effectively extract effective clothing regions from complex circumstances without any artificial parameters. The proposed clothing image segmentation method will play an important role in computer vision, machine learning applications, pattern recognition and intelligent systems
    corecore