4,863 research outputs found

    Synthesizing Training Data for Object Detection in Indoor Scenes

    Full text link
    Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state-of-the-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.Comment: Added more experiments and link to project webpag

    Building synthetic simulated environments for configuring and training multi-camera systems for surveillance applications

    Get PDF
    [EN] Synthetic simulated environments are gaining popularity in the Deep Learning Era, as they can alleviate the effort and cost of two critical tasks to build multi-camera systems for surveillance applications: setting up the camera system to cover the use cases and generating the labeled dataset to train the required Deep Neural Networks (DNNs). However, there are no simulated environments ready to solve them for all kind of scenarios and use cases. Typically, ‘ad hoc’ environments are built, which cannot be easily applied to other contexts. In this work we present a methodology to build synthetic simulated environments with sufficient generality to be usable in different contexts, with little effort. Our methodology tackles the challenges of the appropriate parameterization of scene configurations, the strategies to generate randomly a wide and balanced range of situations of interest for training DNNs with synthetic data, and the quick image capturing from virtual cameras considering the rendering bottlenecks. We show a practical implementation example for the detection of incorrectly placed luggage in aircraft cabins, including the qualitative and quantitative analysis of the data generation process and its influence in a DNN training, and the required modifications to adapt it to other surveillance contexts.This work has received funding from the Clean Sky 2 Joint Undertaking under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 865162, SmaCS (https://www.smacs.eu/

    Manipulating Attributes of Natural Scenes via Hallucination

    Full text link
    In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly utilized in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a diverse set of transient attributes within a single model, eliminating the need of training multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic

    NASA patent abstracts bibliography: A continuing bibliography. Section 1: Abstracts (supplement 23)

    Get PDF
    Abstracts are cited for 129 patents and patent applications introduced into the NASA scientific and technical information system during the period January 1983 through June 1983. Each entry consists of a citation, an abstract, and in most cases, a key illustration selected from the patent or patent application

    DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection

    Full text link
    Though feature-alignment based Domain Adaptive Object Detection (DAOD) have achieved remarkable progress, they ignore the source bias issue, i.e. the aligned features are more favorable towards the source domain, leading to a sub-optimal adaptation. Furthermore, the presence of domain shift between the source and target domains exacerbates the problem of inconsistent classification and localization in general detection pipelines. To overcome these challenges, we propose a novel Distillation-based Unbiased Alignment (DUA) framework for DAOD, which can distill the source features towards a more balanced position via a pre-trained teacher model during the training process, alleviating the problem of source bias effectively. In addition, we design a Target-Relevant Object Localization Network (TROLN), which can mine target-related knowledge to produce two classification-free metrics (IoU and centerness). Accordingly, we implement a Domain-aware Consistency Enhancing (DCE) strategy that utilizes these two metrics to further refine classification confidences, achieving a harmonization between classification and localization in cross-domain scenarios. Extensive experiments have been conducted to manifest the effectiveness of this method, which consistently improves the strong baseline by large margins, outperforming existing alignment-based works.Comment: 10pages,5 figure

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    End-to-end Trainable Ship Detection in SAR Images with Single Level Features

    Get PDF
    Kongsberg Satellite Services (KSAT) use machine learning and manual analysis done by synthetic aperture radar (SAR) specialists on SAR images in real time to provide a ship detection service. KSATs current machine learning model has a limited ability to distinguish ships close to each other. For this reason, we aim to employ an end-to-end trainable object detection model, as they can better distinguish nearby objects, since they are not limited by heuristic post processing. Since heuristic post processing in object detection limit the models ability to distinguish ships close to each other, we investigate challenges related to employing an end-to-end trainable ship detection model. Since access to ground truth annotations in SAR images is limited, size and rotation labels are not available for all ships, and rotation labels are inaccurate. Since KSATs internal datasets are collected as part of a time critical operational service, position labels are not exact. Since existing evaluation metrics for object detection are too strict, they do not reflect user needs for this service. To tolerate missing size and rotation annotations, we base loss label assignment on the distance between objects instead of their IoU, and replace DIoU bounding box loss with a novel size regression loss named Size IoU (SIoU) combined with smooth L1 position loss. To tolerate inaccurate rotation labels, we propose angular direction vector (ADV) regression. To tolerate inaccurate position labels, the loss label assignment makes all predictions responsible for large overlapping regions instead of small disjoint regions. To compare models performance according to user needs, we propose an evaluation metric named Distance-AP (dAP), which is based on mAP, but replaces the IoU overlap threshold with an object center point distance threshold. To reduce duplicate ship predictions, we propose multi layer attention. Using the LS-SSDD SAR ship dataset, we find that replacing IoU based label assignment with position based label assignment increases dAP from 79% to 86%, and that replacing DIoU with SIoU decreases dAP by only 1%. Using a rotation regression benchmark where datasets have different amounts of rotation label noise, we find that ADV outperforms CSL in terms of mean predicted inaccuracy at all noise levels, and median predicted inaccuracy at high noise levels. Using an object detection benchmark where the datasets have varying amount of position label inaccuracy, we find that the proposed loss label assignment tolerates large amounts of noise without reduced performance. Using KSATs dataset of Sentinel 1 images, we measure 83% dAP. The proposed mechanisms allow effective training of a ship detection model, despite the missing size and rotation annotations, inaccurate position annotations, and inaccurate rotation annotations. We believe this is useful for KSATs ship detection service, as it can better distinguish nearby ships. However, more work is required to compare its performance with their existing solution. Source code is available at https://github.com/matill/Ship-detectio
    • …
    corecore