125,197 research outputs found

    BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements

    Full text link
    Understanding of human visual perception has historically inspired the design of computer vision architectures. As an example, perception occurs at different scales both spatially and temporally, suggesting that the extraction of salient visual information may be made more effective by paying attention to specific features at varying scales. Visual changes in the body due to physiological processes also occur at different scales and with modality-specific characteristic properties. Inspired by this, we present BigSmall, an efficient architecture for physiological and behavioral measurement. We present the first joint camera-based facial action, cardiac, and pulmonary measurement model. We propose a multi-branch network with wrapping temporal shift modules that yields both accuracy and efficiency gains. We observe that fusing low-level features leads to suboptimal performance, but that fusing high level features enables efficiency gains with negligible loss in accuracy. Experimental results demonstrate that BigSmall significantly reduces the computational costs. Furthermore, compared to existing task-specific models, BigSmall achieves comparable or better results on multiple physiological measurement tasks simultaneously with a unified model

    Region-based saliency estimation for 3D shape analysis and understanding

    Get PDF
    The detection of salient regions is an important pre-processing step for many 3D shape analysis and understanding tasks. This paper proposes a novel method for saliency detection in 3D free form shapes. Firstly, we smooth the surface normals by a bilateral filter. Such a method is capable of smoothing the surfaces and retaining the local details. Secondly, a novel method is proposed for the estimation of the saliency value of each vertex. To this end, two new features are defined: Retinex-based Importance Feature (RIF) and Relative Normal Distance (RND). They are based on the human visual perception characteristics and surface geometry respectively. Since the vertex based method cannot guarantee that the detected salient regions are semantically continuous and complete, we propose to refine such values based on surface patches. The detected saliency is finally used to guide the existing techniques for mesh simplification, interest point detection, and overlapping point cloud registration. The comparative studies based on real data from three publicly accessible databases show that the proposed method usually outperforms five selected state of the art ones both qualitatively and quantitatively for saliency detection and 3D shape analysis and understanding

    General highlight detection in sport videos

    Get PDF
    Attention is a psychological measurement of human reflection against stimulus. We propose a general framework of highlight detection by comparing attention intensity during the watching of sports videos. Three steps are involved: adaptive selection on salient features, unified attention estimation and highlight identification. Adaptive selection computes feature correlation to decide an optimal set of salient features. Unified estimation combines these features by the technique of multi-resolution autoregressive (MAR) and thus creates a temporal curve of attention intensity. We rank the intensity of attention to discriminate boundaries of highlights. Such a framework alleviates semantic uncertainty around sport highlights and leads to an efficient and effective highlight detection. The advantages are as follows: (1) the capability of using data at coarse temporal resolutions; (2) the robustness against noise caused by modality asynchronism, perception uncertainty and feature mismatch; (3) the employment of Markovian constrains on content presentation, and (4) multi-resolution estimation on attention intensity, which enables the precise allocation of event boundaries

    Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation

    Get PDF
    Despite the tremendous achievements of deep convolutional neural networks (CNNs) in many computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step understanding method, namely Salient Relevance (SR) map, which aims to shed light on how deep CNNs recognize images and learn features from areas, referred to as attention areas, therein. Our proposed method starts out with a layer-wise relevance propagation (LRP) step which estimates a pixel-wise relevance map over the input image. Following, we construct a context-aware saliency map, SR map, from the LRP-generated map which predicts areas close to the foci of attention instead of isolated pixels that LRP reveals. In human visual system, information of regions is more important than of pixels in recognition. Consequently, our proposed approach closely simulates human recognition. Experimental results using the ILSVRC2012 validation dataset in conjunction with two well-established deep CNN models, AlexNet and VGG-16, clearly demonstrate that our proposed approach concisely identifies not only key pixels but also attention areas that contribute to the underlying neural network's comprehension of the given images. As such, our proposed SR map constitutes a convenient visual interface which unveils the visual attention of the network and reveals which type of objects the model has learned to recognize after training. The source code is available at https://github.com/Hey1Li/Salient-Relevance-Propagation.Comment: 35 pages, 15 figure
    corecore