7 research outputs found
Temporally Consistent Horizon Lines
The horizon line is an important geometric feature for many image processing
and scene understanding tasks in computer vision. For instance, in navigation
of autonomous vehicles or driver assistance, it can be used to improve 3D
reconstruction as well as for semantic interpretation of dynamic environments.
While both algorithms and datasets exist for single images, the problem of
horizon line estimation from video sequences has not gained attention. In this
paper, we show how convolutional neural networks are able to utilise the
temporal consistency imposed by video sequences in order to increase the
accuracy and reduce the variance of horizon line estimates. A novel CNN
architecture with an improved residual convolutional LSTM is presented for
temporally consistent horizon line estimation. We propose an adaptive loss
function that ensures stable training as well as accurate results. Furthermore,
we introduce an extension of the KITTI dataset which contains precise horizon
line labels for 43699 images across 72 video sequences. A comprehensive
evaluation shows that the proposed approach consistently achieves superior
performance compared with existing methods
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus
We present a robust estimator for fitting multiple parametric models of the
same form to noisy measurements. Applications include finding multiple
vanishing points in man-made scenes, fitting planes to architectural imagery,
or estimating multiple rigid motions within the same sequence. In contrast to
previous works, which resorted to hand-crafted search strategies for multiple
model detection, we learn the search strategy from data. A neural network
conditioned on previously detected models guides a RANSAC estimator to
different subsets of all measurements, thereby finding model instances one
after another. We train our method supervised as well as self-supervised. For
supervised training of the search strategy, we contribute a new dataset for
vanishing point estimation. Leveraging this dataset, the proposed algorithm is
superior with respect to other robust estimators as well as to designated
vanishing point estimation algorithms. For self-supervised learning of the
search, we evaluate the proposed algorithm on multi-homography estimation and
demonstrate an accuracy that is superior to state-of-the-art methods.Comment: CVPR 202
Using Object Detection on Social Media Images for Urban Bicycle Infrastructure Planning: A Case Study of Dresden
With cities reinforcing greener ways of urban mobility, encouraging urban cycling helps to reduce the number of motorized vehicles on the streets. However, that also leads to a significant increase in the number of bicycles in urban areas, making the question of planning the cycling infrastructure an important topic. In this paper, we introduce a new method for analyzing the demand for bicycle parking facilities in urban areas based on object detection of social media images. We use a subset of the YFCC100m dataset, a collection of posts from the social media platform Flickr, and utilize a state-of-the-art object detection algorithm to detect and classify moving and parked bicycles in the city of Dresden, Germany. We were able to retrieve the vast majority of bicycles while generating few false positives and classify them as either moving or stationary. We then conducted a case study in which we compare areas with a high density of parked bicycles with the number of currently available parking spots in the same areas and identify potential locations where new bicycle parking facilities can be introduced. With the results of the case study, we show that our approach is a useful additional data source for urban bicycle infrastructure planning because it provides information that is otherwise hard to obtain
PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus
We present a real-time method for robust estimation of multiple instances of
geometric models from noisy data. Geometric models such as vanishing points,
planar homographies or fundamental matrices are essential for 3D scene
analysis. Previous approaches discover distinct model instances in an iterative
manner, thus limiting their potential for speedup via parallel computation. In
contrast, our method detects all model instances independently and in parallel.
A neural network segments the input data into clusters representing potential
model instances by predicting multiple sets of sample and inlier weights. Using
the predicted weights, we determine the model parameters for each potential
instance separately in a RANSAC-like fashion. We train the neural network via
task-specific loss functions, i.e. we do not require a ground-truth
segmentation of the input data. As suitable training data for homography and
fundamental matrix fitting is scarce, we additionally present two new synthetic
datasets. We demonstrate state-of-the-art performance on these as well as
multiple established datasets, with inference times as small as five
milliseconds per image.Comment: AAAI 202
Robust Shape Fitting for 3D Scene Abstraction
Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts
Temporally consistent horizon lines
The horizon line is an important geometric feature for many image processing and scene understanding tasks in computer vision. For instance, in navigation of autonomous vehicles or driver assistance, it can be used to improve 3D reconstruction as well as for semantic interpretation of dynamic environments. While both algorithms and datasets exist for single images, the problem of horizon line estimation from video sequences has not gained attention. In this paper, we show how convolutional neural networks are able to utilise the temporal consistency imposed by video sequences in order to increase the accuracy and reduce the variance of horizon line estimates. A novel CNN architecture with an improved residual convolutional LSTM is presented for temporally consistent horizon line estimation. We propose an adaptive loss function that ensures stable training as well as accurate results. Furthermore, we introduce an extension of the KITTI dataset which contains precise horizon line labels for 43699 images across 72 video sequences. A comprehensive evaluation shows that the proposed approach consistently achieves superior performance compared with existing methods