4,863 research outputs found
Synthesizing Training Data for Object Detection in Indoor Scenes
Detection of objects in cluttered indoor environments is one of the key
enabling functionalities for service robots. The best performing object
detection approaches in computer vision exploit deep Convolutional Neural
Networks (CNN) to simultaneously detect and categorize the objects of interest
in cluttered scenes. Training of such models typically requires large amounts
of annotated training data which is time consuming and costly to obtain. In
this work we explore the ability of using synthetically generated composite
images for training state-of-the-art object detectors, especially for object
instance detection. We superimpose 2D images of textured object models into
images of real environments at variety of locations and scales. Our experiments
evaluate different superimposition strategies ranging from purely image-based
blending all the way to depth and semantics informed positioning of the object
models into real scenes. We demonstrate the effectiveness of these object
detector training strategies on two publicly available datasets, the
GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting
some hand-labeled training data with synthetic examples carefully composed onto
scenes yields object detectors with comparable performance to using much more
hand-labeled data. Broadly, this work charts new opportunities for training
detectors for new objects by exploiting existing object model repositories in
either a purely automatic fashion or with only a very small number of
human-annotated examples.Comment: Added more experiments and link to project webpag
Building synthetic simulated environments for configuring and training multi-camera systems for surveillance applications
[EN] Synthetic simulated environments are gaining popularity in the Deep Learning Era, as they can alleviate the
effort and cost of two critical tasks to build multi-camera systems for surveillance applications: setting up
the camera system to cover the use cases and generating the labeled dataset to train the required Deep Neural
Networks (DNNs). However, there are no simulated environments ready to solve them for all kind of scenarios
and use cases. Typically, ‘ad hoc’ environments are built, which cannot be easily applied to other contexts.
In this work we present a methodology to build synthetic simulated environments with sufficient generality to
be usable in different contexts, with little effort. Our methodology tackles the challenges of the appropriate
parameterization of scene configurations, the strategies to generate randomly a wide and balanced range of
situations of interest for training DNNs with synthetic data, and the quick image capturing from virtual cameras
considering the rendering bottlenecks. We show a practical implementation example for the detection of
incorrectly placed luggage in aircraft cabins, including the qualitative and quantitative analysis of the data
generation process and its influence in a DNN training, and the required modifications to adapt it to other
surveillance contexts.This work has received funding from the Clean Sky 2 Joint Undertaking under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 865162, SmaCS (https://www.smacs.eu/
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
NASA patent abstracts bibliography: A continuing bibliography. Section 1: Abstracts (supplement 23)
Abstracts are cited for 129 patents and patent applications introduced into the NASA scientific and technical information system during the period January 1983 through June 1983. Each entry consists of a citation, an abstract, and in most cases, a key illustration selected from the patent or patent application
DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection
Though feature-alignment based Domain Adaptive Object Detection (DAOD) have
achieved remarkable progress, they ignore the source bias issue, i.e. the
aligned features are more favorable towards the source domain, leading to a
sub-optimal adaptation. Furthermore, the presence of domain shift between the
source and target domains exacerbates the problem of inconsistent
classification and localization in general detection pipelines. To overcome
these challenges, we propose a novel Distillation-based Unbiased Alignment
(DUA) framework for DAOD, which can distill the source features towards a more
balanced position via a pre-trained teacher model during the training process,
alleviating the problem of source bias effectively. In addition, we design a
Target-Relevant Object Localization Network (TROLN), which can mine
target-related knowledge to produce two classification-free metrics (IoU and
centerness). Accordingly, we implement a Domain-aware Consistency Enhancing
(DCE) strategy that utilizes these two metrics to further refine classification
confidences, achieving a harmonization between classification and localization
in cross-domain scenarios. Extensive experiments have been conducted to
manifest the effectiveness of this method, which consistently improves the
strong baseline by large margins, outperforming existing alignment-based works.Comment: 10pages,5 figure
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
End-to-end Trainable Ship Detection in SAR Images with Single Level Features
Kongsberg Satellite Services (KSAT) use machine learning and manual analysis done by synthetic aperture radar (SAR) specialists on SAR images in real time to provide a ship detection service.
KSATs current machine learning model has a limited ability to distinguish ships close to each other. For this reason, we aim to employ an end-to-end trainable object detection model, as they can better distinguish nearby objects, since they are not limited by heuristic post processing.
Since heuristic post processing in object detection limit the models ability to distinguish ships close to each other, we investigate challenges related to employing an end-to-end trainable ship detection model. Since access to ground truth annotations in SAR images is limited, size and rotation labels are not available for all ships, and rotation labels are inaccurate. Since KSATs internal datasets are collected as part of a time critical operational service, position labels are not exact. Since existing evaluation metrics for object detection are too strict, they do not reflect user needs for this service.
To tolerate missing size and rotation annotations, we base loss label assignment on the distance between objects instead of their IoU, and replace DIoU bounding box loss with a novel size regression loss named Size IoU (SIoU) combined with smooth L1 position loss. To tolerate inaccurate rotation labels, we propose angular direction vector (ADV) regression. To tolerate inaccurate position labels, the loss label assignment makes all predictions responsible for large overlapping regions instead of small disjoint regions. To compare models performance according to user needs, we propose an evaluation metric named Distance-AP (dAP), which is based on mAP, but replaces the IoU overlap threshold with an object center point distance threshold. To reduce duplicate ship predictions, we propose multi layer attention.
Using the LS-SSDD SAR ship dataset, we find that replacing IoU based label assignment with position based label assignment increases dAP from 79% to 86%, and that replacing DIoU with SIoU decreases dAP by only 1%. Using a rotation regression benchmark where datasets have different amounts of rotation label noise, we find that ADV outperforms CSL in terms of mean predicted inaccuracy at all noise levels, and median predicted inaccuracy at high noise levels. Using an object detection benchmark where the datasets have varying amount of position label inaccuracy, we find that the proposed loss label assignment tolerates large amounts of noise without reduced performance. Using KSATs dataset of Sentinel 1 images, we measure 83% dAP.
The proposed mechanisms allow effective training of a ship detection model, despite the missing size and rotation annotations, inaccurate position annotations, and inaccurate rotation annotations. We believe this is useful for KSATs ship detection service, as it can better distinguish nearby ships. However, more work is required to compare its performance with their existing solution. Source code is available at https://github.com/matill/Ship-detectio
- …