476 research outputs found

    Fine-Grained Head Pose Estimation Without Keypoints

    Full text link
    Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 201

    Unsupervised Learning of Edges

    Full text link
    Data-driven approaches for edge detection have proven effective and achieve top results on modern benchmarks. However, all current data-driven edge detectors require manual supervision for training in the form of hand-labeled region segments or object boundaries. Specifically, human annotators mark semantically meaningful edges which are subsequently used for training. Is this form of strong, high-level supervision actually necessary to learn to accurately detect edges? In this work we present a simple yet effective approach for training edge detectors without human supervision. To this end we utilize motion, and more specifically, the only input to our method is noisy semi-dense matches between frames. We begin with only a rudimentary knowledge of edges (in the form of image gradients), and alternate between improving motion estimation and edge detection in turn. Using a large corpus of video data, we show that edge detectors trained using our unsupervised scheme approach the performance of the same methods trained with full supervision (within 3-5%). Finally, we show that when using a deep network for the edge detector, our approach provides a novel pre-training scheme for object detection.Comment: Camera ready version for CVPR 201

    Does Continual Learning = Catastrophic Forgetting?

    Full text link
    Continual learning is known for suffering from catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples. In this work, we challenge the assumption that continual learning is inevitably associated with catastrophic forgetting by presenting a set of tasks that surprisingly do not suffer from catastrophic forgetting when learned continually. We provide evidence that these reconstruction-type tasks exhibit positive forward transfer and that single-view 3D shape reconstruction improves the performance on learned and novel categories over time. We provide the novel analysis of knowledge transfer ability by looking at the output distribution shift across sequential learning tasks. Finally, we show that the robustness of these tasks leads to the potential of having a proxy representation learning task for continual classification. The codebase, dataset, and pre-trained models released with this article can be found at https://github.com/rehg-lab/CLRec

    The Secrets of Salient Object Segmentation

    Get PDF
    In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object benchmarks, called the dataset design bias, by over emphasizing the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Finally, we report significant benchmark progress on three existing datasets of segmenting salient objectsComment: 15 pages, 8 figures. Conference version was accepted by CVPR 201

    Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

    Full text link
    Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains. This paper addresses the task of automatically generating an alignment between a set of instructions and a first person video demonstrating an activity. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object recognition and computational linguistic techniques. We obtain promising results on both the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset
    • …
    corecore