10,337 research outputs found

    Memory Based Online Learning of Deep Representations from Video Streams

    Full text link
    We present a novel online unsupervised method for face identity learning from video streams. The method exploits deep face descriptors together with a memory based learning mechanism that takes advantage of the temporal coherence of visual data. Specifically, we introduce a discriminative feature matching solution based on Reverse Nearest Neighbour and a feature forgetting strategy that detect redundant features and discard them appropriately while time progresses. It is shown that the proposed learning procedure is asymptotically stable and can be effectively used in relevant applications like multiple face identification and tracking from unconstrained video streams. Experimental results show that the proposed method achieves comparable results in the task of multiple face tracking and better performance in face identification with offline approaches exploiting future information. Code will be publicly available.Comment: arXiv admin note: text overlap with arXiv:1708.0361

    Data-Driven Approach to Simulating Realistic Human Joint Constraints

    Full text link
    Modeling realistic human joint limits is important for applications involving physical human-robot interaction. However, setting appropriate human joint limits is challenging because it is pose-dependent: the range of joint motion varies depending on the positions of other bones. The paper introduces a new technique to accurately simulate human joint limits in physics simulation. We propose to learn an implicit equation to represent the boundary of valid human joint configurations from real human data. The function in the implicit equation is represented by a fully connected neural network whose gradients can be efficiently computed via back-propagation. Using gradients, we can efficiently enforce realistic human joint limits through constraint forces in a physics engine or as constraints in an optimization problem.Comment: To appear at ICRA 2018; 6 pages, 9 figures; for associated video, see https://youtu.be/wzkoE7wCbu

    S3^3FD: Single Shot Scale-invariant Face Detector

    Full text link
    This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S3^3FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchor-based detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-the-art detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.Comment: Accepted by ICCV 2017 + its supplementary materials; Updated the latest results on WIDER FAC

    Benchmarking of Embedded Object Detection in Optical and RADAR Scenes

    Get PDF
    A portable, real-time vital sign estimation protoype is developed using neural network- based localization, multi-object tracking, and embedded processing optimizations. The system estimates heart and respiration rates of multiple subjects using directional of arrival techniques on RADAR data. This system is useful in many civilian and military applications including search and rescue. The primary contribution from this work is the implementation and benchmarking of neural networks for real time detection and localization on various systems including the testing of eight neural networks on a discrete GPU and Jetson Xavier devices. Mean average precision (mAP) and inference speed benchmarks were performed. We have shown fast and accurate detection and tracking using synthetic and real RADAR data. Another major contribution is the quantification of the relationship between neural network mAP performance and data augmentations. As an example, we focused on image and video compression methods, such as JPEG, WebP, H264, and H265. The results show WebP at a quantization level of 50 and H265 at a constant rate factor of 30 provide the best balance between compression and acceptable mAP. Other minor contributions are achieved in enhancing the functionality of the real-time prototype system. This includes the implementation and benchmarking of neural network op- timizations, such as quantization and pruning. Furthermore, an appearance-based synthetic RADAR and real RADAR datasets are developed. The latter contains simultaneous optical and RADAR data capture and cross-modal labels. Finally, multi-object tracking methods are benchmarked and a support vector machine is utilized for cross-modal association. In summary, the implementation, benchmarking, and optimization of methods for detection and tracking helped create a real-time vital sign system on a low-profile embedded device. Additionally, this work established a relationship between compression methods and different neural networks for optimal file compression and network performance. Finally, methods for RADAR and optical data collection and cross-modal association are implemented

    Practical Color-Based Motion Capture

    Get PDF
    Motion capture systems have been widely used for high quality content creation and virtual reality but are rarely used in consumer applications due to their price and setup cost. In this paper, we propose a motion capture system built from commodity components that can be deployed in a matter of minutes. Our approach uses one or more webcams and a color shirt to track the upper-body at interactive rates. We describe a robust color calibration system that enables our color-based tracking to work against cluttered backgrounds and under multiple illuminants. We demonstrate our system in several real-world indoor and outdoor settings
    corecore