Search CORE

155,727 research outputs found

DeepKey: Towards End-to-End Physical Key Replication From a Single Photograph

Author: Alex Krizhevsky
EH Adelson
Jürgen Schmidhuber
Kaiming He
S Ren
Publication venue
Publication date: 04/11/2018
Field of study

This paper describes DeepKey, an end-to-end deep neural architecture capable of taking a digital RGB image of an 'everyday' scene containing a pin tumbler key (e.g. lying on a table or carpet) and fully automatically inferring a printable 3D key model. We report on the key detection performance and describe how candidates can be transformed into physical prints. We show an example opening a real-world lock. Our system is described in detail, providing a breakdown of all components including key detection, pose normalisation, bitting segmentation and 3D model inference. We provide an in-depth evaluation and conclude by reflecting on limitations, applications, potential security risks and societal impact. We contribute the DeepKey Datasets of 5, 300+ images covering a few test keys with bounding boxes, pose and unaligned mask data.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Detect-and-Track: Efficient Pose Estimation in Videos

Author: Girdhar Rohit
Gkioxari Georgia
Paluri Manohar
Torresani Lorenzo
Tran Du
Publication venue
Publication date: 02/05/2018
Field of study

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection and video understanding. Our method operates in two-stages: keypoint estimation in frames or short clips, followed by lightweight tracking to generate keypoint predictions linked over the entire video. For frame-level pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D extension of this model, which leverages temporal information over small clips to generate more robust frame predictions. We conduct extensive ablative experiments on the newly released multi-person video pose estimation benchmark, PoseTrack, to validate various design choices of our model. Our approach achieves an accuracy of 55.2% on the validation and 51.8% on the test set using the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack and webpage: https://rohitgirdhar.github.io/DetectAndTrack

arXiv.org e-Print Archive

Crossref

Vehicle pose estimation using G-Net: multi-class localization and depth estimation

Author: Agudo Martínez Antonio
García López Javier
Moreno-Noguer Francesc
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

In this paper we present a new network architecture, called G-Net, for 3D pose estimation on RGB images which is trained in a weakly supervised manner. We introduce a two step pipeline based on region-based Convolutional neural networks (CNNs) for feature localization, bounding box refinement based on non-maximum-suppression and depth estimation. The G-Net is able to estimate the depth from single monocular images with a self-tuned loss function. The combination of this predicted depth and the presented two-step localization allows the extraction of the 3D pose of the object. We show in experiments that our method achieves good results compared to other state-of-the-art approaches which are trained in a fully supervised manner.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Informed MCMC with Bayesian Neural Networks for Facial Image Analysis

Author: Kortylewski Adam
Morel-Forster Andreas
Parbhoo Sonali
Roth Volker
Vetter Thomas
Wieczorek Aleksander
Wieser Mario
Publication venue
Publication date: 01/01/2018
Field of study

Computer vision tasks are difficult because of the large variability in the data that is induced by changes in light, background, partial occlusion as well as the varying pose, texture, and shape of objects. Generative approaches to computer vision allow us to overcome this difficulty by explicitly modeling the physical image formation process. Using generative object models, the analysis of an observed image is performed via Bayesian inference of the posterior distribution. This conceptually simple approach tends to fail in practice because of several difficulties stemming from sampling the posterior distribution: high-dimensionality and multi-modality of the posterior distribution as well as expensive simulation of the rendering process. The main difficulty of sampling approaches in a computer vision context is choosing the proposal distribution accurately so that maxima of the posterior are explored early and the algorithm quickly converges to a valid image interpretation. In this work, we propose to use a Bayesian Neural Network for estimating an image dependent proposal distribution. Compared to a standard Gaussian random walk proposal, this accelerates the sampler in finding regions of the posterior with high value. In this way, we can significantly reduce the number of samples needed to perform facial image analysis.Comment: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 201

arXiv.org e-Print Archive

edoc