2,251 research outputs found
Learning Dilation Factors for Semantic Segmentation of Street Scenes
Contextual information is crucial for semantic segmentation. However, finding
the optimal trade-off between keeping desired fine details and at the same time
providing sufficiently large receptive fields is non trivial. This is even more
so, when objects or classes present in an image significantly vary in size.
Dilated convolutions have proven valuable for semantic segmentation, because
they allow to increase the size of the receptive field without sacrificing
image resolution. However, in current state-of-the-art methods, dilation
parameters are hand-tuned and fixed. In this paper, we present an approach for
learning dilation parameters adaptively per channel, consistently improving
semantic segmentation results on street-scene datasets like Cityscapes and
Camvid.Comment: GCPR201
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Dynamic Adaptation on Non-Stationary Visual Domains
Domain adaptation aims to learn models on a supervised source domain that
perform well on an unsupervised target. Prior work has examined domain
adaptation in the context of stationary domain shifts, i.e. static data sets.
However, with large-scale or dynamic data sources, data from a defined domain
is not usually available all at once. For instance, in a streaming data
scenario, dataset statistics effectively become a function of time. We
introduce a framework for adaptation over non-stationary distribution shifts
applicable to large-scale and streaming data scenarios. The model is adapted
sequentially over incoming unsupervised streaming data batches. This enables
improvements over several batches without the need for any additionally
annotated data. To demonstrate the effectiveness of our proposed framework, we
modify associative domain adaptation to work well on source and target data
batches with unequal class distributions. We apply our method to several
adaptation benchmark datasets for classification and show improved classifier
accuracy not only for the currently adapted batch, but also when applied on
future stream batches. Furthermore, we show the applicability of our
associative learning modifications to semantic segmentation, where we achieve
competitive results
Modeling Camera Effects to Improve Visual Learning from Synthetic Data
Recent work has focused on generating synthetic imagery to increase the size
and variability of training data for learning visual tasks in urban scenes.
This includes increasing the occurrence of occlusions or varying environmental
and weather effects. However, few have addressed modeling variation in the
sensor domain. Sensor effects can degrade real images, limiting
generalizability of network performance on visual tasks trained on synthetic
data and tested in real environments. This paper proposes an efficient,
automatic, physically-based augmentation pipeline to vary sensor effects
--chromatic aberration, blur, exposure, noise, and color cast-- for synthetic
imagery. In particular, this paper illustrates that augmenting synthetic
training datasets with the proposed pipeline reduces the domain gap between
synthetic and real domains for the task of object detection in urban driving
scenes
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
We propose a real-time RGB-based pipeline for object detection and 6D pose
estimation. Our novel 3D orientation estimation is based on a variant of the
Denoising Autoencoder that is trained on simulated views of a 3D model using
Domain Randomization. This so-called Augmented Autoencoder has several
advantages over existing methods: It does not require real, pose-annotated
training data, generalizes to various test sensors and inherently handles
object and view symmetries. Instead of learning an explicit mapping from input
images to object poses, it provides an implicit representation of object
orientations defined by samples in a latent space. Our pipeline achieves
state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D
domain. We also evaluate on the LineMOD dataset where we can compete with other
synthetically trained approaches. We further increase performance by correcting
3D orientation estimates to account for perspective errors when the object
deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
Training Deep Learning Models via Synthetic Data: Application in Unmanned Aerial Vehicles
This paper describes preliminary work in the recent promising approach of
generating synthetic training data for facilitating the learning procedure of
deep learning (DL) models, with a focus on aerial photos produced by unmanned
aerial vehicles (UAV). The general concept and methodology are described, and
preliminary results are presented, based on a classification problem of fire
identification in forests as well as a counting problem of estimating number of
houses in urban areas. The proposed technique constitutes a new possibility for
the DL community, especially related to UAV-based imagery analysis, with much
potential, promising results, and unexplored ground for further research.Comment: Workshop on Deep-learning based computer vision for UAV in
conjunction with CAIP 2019, Salerno, italy, September 201
AutoSimulate: (Quickly) Learning Synthetic Data Generation
Simulation is increasingly being used for generating large labelled datasets
in many machine learning problems. Recent methods have focused on adjusting
simulator parameters with the goal of maximising accuracy on a validation task,
usually relying on REINFORCE-like gradient estimators. However these approaches
are very expensive as they treat the entire data generation, model training,
and validation pipeline as a black-box and require multiple costly objective
evaluations at each iteration. We propose an efficient alternative for optimal
synthetic data generation, based on a novel differentiable approximation of the
objective. This allows us to optimize the simulator, which may be
non-differentiable, requiring only one objective evaluation at each iteration
with a little overhead. We demonstrate on a state-of-the-art photorealistic
renderer that the proposed method finds the optimal data distribution faster
(up to ), with significantly reduced training data generation (up to
) and better accuracy () on real-world test datasets than
previous methods.Comment: ECCV 202
Calculating the energy spectra of magnetic molecules: application of real- and spin-space symmetries
The determination of the energy spectra of small spin systems as for instance
given by magnetic molecules is a demanding numerical problem. In this work we
review numerical approaches to diagonalize the Heisenberg Hamiltonian that
employ symmetries; in particular we focus on the spin-rotational symmetry SU(2)
in combination with point-group symmetries. With these methods one is able to
block-diagonalize the Hamiltonian and thus to treat spin systems of
unprecedented size. In addition it provides a spectroscopic labeling by
irreducible representations that is helpful when interpreting transitions
induced by Electron Paramagnetic Resonance (EPR), Nuclear Magnetic Resonance
(NMR) or Inelastic Neutron Scattering (INS). It is our aim to provide the
reader with detailed knowledge on how to set up such a diagonalization scheme.Comment: 29 pages, many figure
Analysis and comparison of very large metagenomes with fast clustering and functional annotation
<p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p
- …