173,377 research outputs found

    3D Face Modelling, Analysis and Synthesis

    Get PDF
    Human faces have always been of a special interest to researchers in the computer vision and graphics areas. There has been an explosion in the number of studies around accurately modelling, analysing and synthesising realistic faces for various applications. The importance of human faces emerges from the fact that they are invaluable means of effective communication, recognition, behaviour analysis, conveying emotions, etc. Therefore, addressing the automatic visual perception of human faces efficiently could open up many influential applications in various domains, e.g. virtual/augmented reality, computer-aided surgeries, security and surveillance, entertainment, and many more. However, the vast variability associated with the geometry and appearance of human faces captured in unconstrained videos and images renders their automatic analysis and understanding very challenging even today. The primary objective of this thesis is to develop novel methodologies of 3D computer vision for human faces that go beyond the state of the art and achieve unprecedented quality and robustness. In more detail, this thesis advances the state of the art in 3D facial shape reconstruction and tracking, fine-grained 3D facial motion estimation, expression recognition and facial synthesis with the aid of 3D face modelling. We give a special attention to the case where the input comes from monocular imagery data captured under uncontrolled settings, a.k.a. \textit{in-the-wild} data. This kind of data are available in abundance nowadays on the internet. Analysing these data pushes the boundaries of currently available computer vision algorithms and opens up many new crucial applications in the industry. We define the four targeted vision problems (3D facial reconstruction &\& tracking, fine-grained 3D facial motion estimation, expression recognition, facial synthesis) in this thesis as the four 3D-based essential systems for the automatic facial behaviour understanding and show how they rely on each other. Finally, to aid the research conducted in this thesis, we collect and annotate a large-scale videos dataset of monocular facial performances. All of our proposed methods demonstarte very promising quantitative and qualitative results when compared to the state-of-the-art methods

    Intelligent computational sketching support for conceptual design

    Get PDF
    Sketches, with their flexibility and suggestiveness, are in many ways ideal for expressing emerging design concepts. This can be seen from the fact that the process of representing early designs by free-hand drawings was used as far back as in the early 15th century [1]. On the other hand, CAD systems have become widely accepted as an essential design tool in recent years, not least because they provide a base on which design analysis can be carried out. Efficient transfer of sketches into a CAD representation, therefore, is a powerful addition to the designers' armoury.It has been pointed out by many that a pen-on-paper system is the best tool for sketching. One of the crucial requirements of a computer aided sketching system is its ability to recognise and interpret the elements of sketches. 'Sketch recognition', as it has come to be known, has been widely studied by people working in such fields: as artificial intelligence to human-computer interaction and robotic vision. Despite the continuing efforts to solve the problem of appropriate conceptual design modelling, it is difficult to achieve completely accurate recognition of sketches because usually sketches implicate vague information, and the idiosyncratic expression and understanding differ from each designer

    Attention in Computer Vision

    Get PDF
    Thanks to deep learning, computer vision has advanced by a large margin. Attention mechanism, inspired from human vision system and acts as a versatile module or mechanism that widely applied in the current deep computer vision models, strengthens the power of deep models. However, most attention models have been trained end-to-end. Why and how those attention models work? How similar is the trained attention to the human attention where it was inspired? Those questions are still unknown to us, which thus hinders us to design a better attention model, architecture or algorithm that can further advance the computer vision field. In this thesis, we aim to unravel the mysterious attention models by studying attention mechanisms in computer vision during the deep learning era. In the first part of this thesis, we study bottom-up attention. Under the umbrella of saliency prediction, bottom-up attention has progressed a lot with the help of deep learning. However, the deep saliency models are still a black box to us and their performance has reached a ceiling. Therefore, the first part of this thesis aims to understand what happened inside the deep models when it is trained for saliency prediction. Concretely, this thesis dissected each individual unit inside a deep model that has been trained for saliency prediction. Our analysis discloses the secrets of deep models for saliency prediction as well as their limitations, and give new insights for future saliency modelling. In the second part, we study top-down attention in computer vision. Top-down attention, a mechanism usually builds on top of bottom-up attention, has achieved great success in a lot of computer vision tasks. However, their success raised an interesting question, namely, ``are those learned top-down attention similar to human attention under the same task?''. To answer this question, we have collected a dataset which recorded human attention under the image captioning task. Using our collected dataset, we analyse what is the difference between attention exploited by a deep model for image captioning and human attention under the same task. Our research shows that current widely used soft attention mechanism is different from human attention under the same task. In the meanwhile, we use human attention, as a prior knowledge, to help machine to perform better in the image captioning task. In the third part, we study contextual attention. It is a complementary part to both bottom-up and top-down attention, which contextualizes each informative region with attention. Prior contextual attention methods either adopt the contextual module in natural language processing that is only suitable for 1-D sequential inputs or complex two stream graph neural networks. Motivated by the difference of semantic units between sentences and images, we designed a transformer based architecture for image captioning. Our design widens original transformer layer by using the 2-D spatial relationship and achieves competitive performance for image captioning

    Image segmentation for human motion analysis: methods and applications

    Get PDF
    Human motion analysis is closely connected with the development of computational techniques capable of automatically identify objects represented in image sequences, track and analyse its movement. Feature extraction is generally the first step in the study of human motion in image sequences which is strictly related to human motion modelling [1]. Next step is feature correspondence, where the problem of matching features between two consecutives image frames is addressed. Finally high level processing can be used in several applications of Computer Vision like, for instance, in the recognition of human movements, activities or poses. This work will focus in the study of image segmentation methods and applications for human motion analysis. Image segmentation methods related to human motion need to deal with several challenges such as: dynamic backgrounds, for instance when the camera is in motion; lighting conditions that can change along the image sequences; occlusion problems, when the subject does not remain inside the workspace; or image sequences with more than one subject in the workspace at the same time. It is not easy to develop methods which can deal with all thes

    Features-based moving objects tracking for smart video surveillances: A review

    Get PDF
    Video surveillance is one of the most active research topics in the computer vision due to the increasing need for security. Although surveillance systems are getting cheaper, the cost of having human operators to monitor the video feed can be very expensive and inefficient. To overcome this problem, the automated visual surveillance system can be used to detect any suspicious activities that require immediate action. The framework of a video surveillance system encompasses a large scope in machine vision, they are background modelling, object detection, moving objects classification, tracking, motion analysis, and require fusion of information from the camera networks. This paper reviews recent techniques used by researchers for detection of moving object detection and tracking in order to solve many surveillance problems. The features and algorithms used for modelling the object appearance and tracking multiple objects in outdoor and indoor environment are also reviewed in this paper. This paper summarizes the recent works done by previous researchers in moving objects tracking for single camera view and multiple cameras views. Nevertheless, despite of the recent progress in surveillance technologies, there still are challenges that need to be solved before the system can come out with a reliable automated video surveillance

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    Visual region understanding: unsupervised extraction and abstraction

    Get PDF
    The ability to gain a conceptual understanding of the world in uncontrolled environments is the ultimate goal of vision-based computer systems. Technological societies today are heavily reliant on surveillance and security infrastructure, robotics, medical image analysis, visual data categorisation and search, and smart device user interaction, to name a few. Out of all the complex problems tackled by computer vision today in context of these technologies, that which lies closest to the original goals of the field is the subarea of unsupervised scene analysis or scene modelling. However, its common use of low level features does not provide a good balance between generality and discriminative ability, both a result and a symptom of the sensory and semantic gaps existing between low level computer representations and high level human descriptions. In this research we explore a general framework that addresses the fundamental problem of universal unsupervised extraction of semantically meaningful visual regions and their behaviours. For this purpose we address issues related to (i) spatial and spatiotemporal segmentation for region extraction, (ii) region shape modelling, and (iii) the online categorisation of visual object classes and the spatiotemporal analysis of their behaviours. Under this framework we propose (a) a unified region merging method and spatiotemporal region reduction, (b) shape representation by the optimisation and novel simplication of contour-based growing neural gases, and (c) a foundation for the analysis of visual object motion properties using a shape and appearance based nearest-centroid classification algorithm and trajectory plots for the obtained region classes. 1 Specifically, we formulate a region merging spatial segmentation mechanism that combines and adapts features shown previously to be individually useful, namely parallel region growing, the best merge criterion, a time adaptive threshold, and region reduction techniques. For spatiotemporal region refinement we consider both scalar intensity differences and vector optical flow. To model the shapes of the visual regions thus obtained, we adapt the growing neural gas for rapid region contour representation and propose a contour simplication technique. A fast unsupervised nearest-centroid online learning technique next groups observed region instances into classes, for which we are then able to analyse spatial presence and spatiotemporal trajectories. The analysis results show semantic correlations to real world object behaviour. Performance evaluation of all steps across standard metrics and datasets validate their performance

    A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment

    Full text link
    Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge. In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance. Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis
    corecore