524 research outputs found

    Contributions on 3D Biometric Face Recognition for point clouds in low-resolution devices

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Mecânica, 2020.Recentemente, diversos processos de automação fazem uso de conhecimentos relacionados a visão computacional, utilizando-se das informações digitalizadas que auxiliam na tomada de decisões destes processos. O estudo de informações 3D é um assunto que vem sendo recorrente em comu- nidades de visão computacional e atividades gráficas. Uma gama de métodos vem sendo propostos visando obter melhores resultados de performance, em termos de acurácia e robustez. O objetivo deste trabalho é contribuir com métodos de reconhecimento facial em dispositivos de baixa res- olução de núvens de ponto. Neste trabalho realiza-se um processo de reconhecimento facial em uma base de dados contendo 31 sujeitos, em que cada sujeito apresenta 3 imagens de profundidade e 3 imagens de cor (RGB). As imagens de cor são utilizadas para detecção facial por uso de um Haar Cascade, que permite a extração dos pontos da face da imagem de profundidade formando uma nuvem de pontos 3D. Da nuvem de pontos foram extraídas a intensidade normal e a intensi- dade do índice de curvatura de cada ponto permitindo a formação de uma imagem bidimensional, intitulada de mapa de curvatura, a partir da qual extrai-se histogramas utilizados no processo de reconhecimento facial. Junto com os mapas de curvature, Um novo método de correspondência é proposto por meio da adaptação do algoritmo clássico de Bozorth, formando uma representação 3D de marcos faciais em nuvens de ponto de baixa resolução para prover um descritor dos pontos chaves da nuvem e extrair uma representação única de cada indivíduo. A validação é realizada e comparada com uma técnica de linha de base para reconhecimento facial 3D. O manuscrito apre- sentado provê multiplos cenários de teste (faces frontais, acurácia, escala e orientação) para ambos métodos atingindo uma acurácia de 98.92% no melhor caso dos mapas de curvature e uma acurácio de 100% no melhor caso do algoritmo clássico de Bozorth adaptado.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Recently, many automation processes make use of knowledge related to computer vision, exploiting digital information in the form of images or data that assists the decision-making of these processes. 3D data recognition is a trending topic in computer vision and graphics tasks. Many methods had been proposed for applications on 3D, expecting a better performance in accuracy and robustness. The main goal of this manuscript is to contribute with face recognition methods for low-resolution point cloud devices. In this manuscript, a face recognition process was accomplished in a 31 subject database, using colorful images (RGB) and depth images for each subject. The colorful images are utilized for face detection by a Haar Cascade algorithm, allowing the extraction of facial points in the depth image and the generation of a face 3D point cloud. The point cloud is used to extract the normal intensity and the curvature index intensity of each point, allowing the confection of a bidimensional image, entitled curvature map, of which histograms are obtained to perform the facial recognition task. Along with the curvature maps, a novel matching method is proposed by an adaptation of the classic Bozorth’s algorithm, forming a net-based 3D representation of facial landmarks in a low resolution point cloud in order to provide a descriptor of the cloud key points and extract an unique representation for each individual. The validation was fulfilled and compared with a baseline technique for 3D face recognition. The presented manuscript provide multiple testing scenarios (frontal faces, accuracy, scale and orientation) for both methods, achieving an accuracy of 98.92% in the best case of the curvature maps and an 100% accuracy in the best case of the classic Bozorth’s algorithm adaptation

    Efficient Human Activity Recognition in Large Image and Video Databases

    Get PDF
    Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images

    People re-identification using depth and intensity information from an overhead sensor

    Get PDF
    This work presents a new people re-identification method, using depth and intensity images, both of them captured with a single static camera, located in an overhead position. The proposed solution arises from the need that exists in many areas of application to carry out identification and re-identification processes to determine, for example, the time that people remain in a certain space, while fulfilling the requirement of preserving people's privacy. This work is a novelty compared to other previous solutions, since the use of top-view images of depth and intensity allows obtaining information to perform the functions of identification and re-identification of people, maintaining their privacy and reducing occlusions. In the procedure of people identification and re-identification, only three frames of intensity and depth are used, so that the first one is obtained when the person enters the scene (frontal view), the second when it is in the central area of the scene (overhead view) and the third one when it leaves the scene (back view). In the implemented method only information from the head and shoulders of people with these three different perspectives is used. From these views three feature vectors are obtained in a simple way, two of them related to depth information and the other one related to intensity data. This increases the robustness of the method against lighting changes. The proposal has been evaluated in two different datasets and compared to other state-of-the-art proposal. The obtained results show a 96,7% success rate in re-identification, with sensors that use different operating principles, all of them obtaining depth and intensity information. Furthermore, the implemented method can work in real time on a PC, without using a GPU.Ministerio de Economía y CompetitividadAgencia Estatal de InvestigaciónUniversidad de Alcal

    Deep understanding of shopper behaviours and interactions using RGB-D vision

    Get PDF
    AbstractIn retail environments, understanding how shoppers move about in a store's spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers' movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytic and re-identification were developed with the aim of learning shopper skills based on occlusion-free RGB-D cameras in a top-view configuration. However, after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour. In this paper, a novel VRAI deep learning application that uses three convolutional neural networks to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper–shelf interactions from a single RGB-D video flow with near real-time performances has been introduced. The framework is evaluated on the following three new datasets that are publicly available: TVHeads for people counting, HaDa for shopper–shelf interactions and TVPR2 for people re-identification. The experimental results show that the proposed methods significantly outperform all competitive state-of-the-art methods (accuracy of 99.5% on people counting, 92.6% on interaction classification and 74.5% on re-id), bringing to different and significative insights for implicit and extensive shopper behaviour analysis for marketing applications

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Comprehensive review of vision-based fall detection systems

    Get PDF
    Vision-based fall detection systems have experienced fast development over the last years. To determine the course of its evolution and help new researchers, the main audience of this paper, a comprehensive revision of all published articles in the main scientific databases regarding this area during the last five years has been made. After a selection process, detailed in the Materials and Methods Section, eighty-one systems were thoroughly reviewed. Their characterization and classification techniques were analyzed and categorized. Their performance data were also studied, and comparisons were made to determine which classifying methods best work in this field. The evolution of artificial vision technology, very positively influenced by the incorporation of artificial neural networks, has allowed fall characterization to become more resistant to noise resultant from illumination phenomena or occlusion. The classification has also taken advantage of these networks, and the field starts using robots to make these systems mobile. However, datasets used to train them lack real-world data, raising doubts about their performances facing real elderly falls. In addition, there is no evidence of strong connections between the elderly and the communities of researchers

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Automated Semantic Content Extraction from Images

    Get PDF
    In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
    corecore