Search CORE

560 research outputs found

OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.

Author: A Saxena
B Li
C Plagemann
F Liu
H Kim
K Karsch
K Karsch
K Matzen
M Ruder
N Silberman
N Srivastava
O Özyeşil
P Hedman
R Garg
R Hartley
S Li
T Rhee
Y Furukawa
Y Zhang
Publication venue
Publication date: 12/09/2018
Field of study

Recent work on depth estimation up to now has only focused on projective images ignoring 360o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360o images. We show promising results in our synthesized data as well as in unseen realistic images

Crossref

ZENODO

BlobGAN: Spatially Disentangled Scene Representations

Author: Efros Alexei A.
Epstein Dave
Park Taesung
Shechtman Eli
Zhang Richard
Publication venue
Publication date: 29/07/2022
Field of study

We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different blobs with different entities in a scene and to arrange these blobs to capture scene layout. We demonstrate this emergent behavior by showing that, despite training without any supervision, our method enables applications such as easy manipulation of objects within a scene (e.g., moving, removing, and restyling furniture), creation of feasible scenes given constraints (e.g., plausible rooms with drawers at a particular location), and parsing of real-world images into constituent parts. On a challenging multi-category dataset of indoor scenes, BlobGAN outperforms StyleGAN2 in image quality as measured by FID. See our project page for video results and interactive demo: https://www.dave.ml/blobganComment: ECCV 2022. Project webpage available at https://www.dave.ml/blobga

arXiv.org e-Print Archive

Self Adversarial Training for Human Pose Estimation

Author: arjovsky
belagiannis
berthelot
bulat
cao
carreira
chen
chen
chu
gkioxari
gong
goodfellow
gulrajani
insafutdinov
isola
ledig
lifshitz
luc
mirza
newell
pan
pishchulin
radford
rafi
ramakrishna
tompson
wei
zhao
Publication venue
Publication date: 15/08/2017
Field of study

This paper presents a deep learning based approach to the problem of human pose estimation. We employ generative adversarial networks as our learning paradigm in which we set up two stacked hourglass networks with the same architecture, one as the generator and the other as the discriminator. The generator is used as a human pose estimator after the training is done. The discriminator distinguishes ground-truth heatmaps from generated ones, and back-propagates the adversarial loss to the generator. This process enables the generator to learn plausible human body configurations and is shown to be useful for improving the prediction accuracy.Comment: CVPR 2017 Workshop on Visual Understanding of Humans in Crowd Scene and the 1st Look Into Person (LIP) Challeng

arXiv.org e-Print Archive

Crossref

Dynamic in Static:Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation

Author: Jiao Jianbo
Nie Liqiang
Pei Gensheng
Tang Jinhui
Wang Wenguan
Yao Yazhou
Publication venue: arXiv
Publication date: 21/04/2024
Field of study

Conventional video object segmentation (VOS) methods usually necessitate a substantial volume of pixel-level annotated video data for fully supervised learning. In this paper, we present HVC, a \textbf{h}ybrid static-dynamic \textbf{v}isual \textbf{c}orrespondence framework for self-supervised VOS. HVC extracts pseudo-dynamic signals from static images, enabling an efficient and scalable VOS model. Our approach utilizes a minimalist fully-convolutional architecture to capture static-dynamic visual correspondence in image-cropped views. To achieve this objective, we present a unified self-supervised approach to learn visual representations of static-dynamic feature similarity. Firstly, we establish static correspondence by utilizing a priori coordinate information between cropped views to guide the formation of consistent static feature representations. Subsequently, we devise a concise convolutional layer to capture the forward / backward pseudo-dynamic signals between two views, serving as cues for dynamic representations. Finally, we propose a hybrid visual correspondence loss to learn joint static and dynamic consistency representations. Our approach, without bells and whistles, necessitates only one training session using static image data, significantly reducing memory consumption (

\sim

16GB) and training time (

\sim

\textbf{2h}). Moreover, HVC achieves state-of-the-art performance in several self-supervised VOS benchmarks and additional video label propagation tasks

University of Birmingham Research Portal

ULISSE: an unsupervised algorithm for detecting reliable dependency parses

Author: Dell\u27Orletta Felice
Montemagni Simonetta
Venturi Giulia
Publication venue: Association for Computational Linguistics Stroudsburg, PA, USA
Publication date
Field of study

In this paper we present ULISSE, an unsupervised linguistically--driven algorithm to select reliable parses from the output of a dependency parser. Different experiments were devised to show that the algorithm is robust enough to deal with the output of different parsers and with different languages, as well as to be used across different domains. In all cases, ULISSE appears to outperform the baseline algorithms

PUblication MAnagement