81,671 research outputs found

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Deep Learning Face Attributes in the Wild

    Full text link
    Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201

    From Facial Parts Responses to Face Detection: A Deep Learning Approach

    Full text link
    In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW. Specifically, our method achieves a high recall rate of 90.99% on the challenging FDDB benchmark, outperforming the state-of-the-art method by a large margin of 2.91%. Importantly, we consider finding faces from a new perspective through scoring facial parts responses by their spatial structure and arrangement. The scoring mechanism is carefully formulated considering challenging cases where faces are only partially visible. This consideration allows our network to detect faces under severe occlusion and unconstrained pose variation, which are the main difficulty and bottleneck of most existing face detection approaches. We show that despite the use of DCN, our network can achieve practical runtime speed.Comment: To appear in ICCV 201

    Smart FRP Composite Sandwich Bridge Decks in Cold Regions

    Get PDF
    INE/AUTC 12.0

    Spatial patterns of knowledge-intensive business services in cities of various sizes, morphologies and economies

    Get PDF
    We compare intra-urban localization patterns of advertising and IT companies in three large Czech cities. The main aim of our analysis is an empirically-based contribution to the question to what extent do knowledge bases affect the spatial distribution of various knowledge-intensive business industries. The central research question is: To what extent is the localization of these two industries influenced by different modes of innovation/knowledge bases (symbolic vs. synthetic) and to what extent by contextual factors, such as urban size, morphology, position in the urban hierarchy and economic profile of the given city. We found that the urban contexts shape the localization patterns of advertising and IT companies more than differences in knowledge bases-both industries cluster primarily in the inner cities and urban cores. Formation of more suburban IT "scientific neighborhoods" is limited.Web of Science125art. no. 184

    Gas Source Localization Strategies for Teleoperated Mobile Robots. An Experimental Analysis

    Get PDF
    Gas source localization (GSL) is one of the most important and direct applications of a gas sensitive mobile robot, and consists in searching for one or multiple volatile emission sources with a mobile robot that has improved sensing capabilities (i.e. olfaction, wind flow, etc.). This work adresses GSL by employing a teleoperated mobile robot, and focuses on which search strategy is the most suitable for this teleoperated approach. Four different search strategies, namely chemotaxis, anemotaxis, gas-mapping, and visual-aided search, are analyzed and evaluated according to a set of proposed indicators (e.g. accuracy, efficiency, success rate, etc.) to determine the most suitable one for a human-teleoperated mobile robot. Experimental validation is carried out employing a large dataset composed of over 150 trials where volunteer operators had to locate a gas-leak in a virtual environment under various and realistic environmental conditions (i.e. different wind flow patterns and gas source locations). We report different findings, from which we highlight that, against intuition, visual-aided search is not always the best strategy, but depends on the environmental conditions and the operator’s ability to understand how gas distributes.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Multispectral Palmprint Encoding and Recognition

    Full text link
    Palmprints are emerging as a new entity in multi-modal biometrics for human identification and verification. Multispectral palmprint images captured in the visible and infrared spectrum not only contain the wrinkles and ridge structure of a palm, but also the underlying pattern of veins; making them a highly discriminating biometric identifier. In this paper, we propose a feature encoding scheme for robust and highly accurate representation and matching of multispectral palmprints. To facilitate compact storage of the feature, we design a binary hash table structure that allows for efficient matching in large databases. Comprehensive experiments for both identification and verification scenarios are performed on two public datasets -- one captured with a contact-based sensor (PolyU dataset), and the other with a contact-free sensor (CASIA dataset). Recognition results in various experimental setups show that the proposed method consistently outperforms existing state-of-the-art methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA) are the lowest reported in literature on both dataset and clearly indicate the viability of palmprint as a reliable and promising biometric. All source codes are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z. Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral Palmprint Encoding for Human Recognition", International Conference on Computer Vision, 2011. MATLAB Code available: https://sites.google.com/site/zohaibnet/Home/code
    • …
    corecore