Search CORE

4,863 research outputs found

Synthesizing Training Data for Object Detection in Indoor Scenes

Author: Berg Alexander C.
Georgakis Georgios
Kosecka Jana
Mousavian Arsalan
Publication venue
Publication date: 07/09/2017
Field of study

Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state-of-the-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.Comment: Added more experiments and link to project webpag

arXiv.org e-Print Archive

Crossref

Building synthetic simulated environments for configuring and training multi-camera systems for surveillance applications

Author: Aranjuelo Ansa Nerea
Elordi Hidalgo Unai
García Castaño Jorge
García Torres Sara
Otaegui Madurga Oihana
Unzueta Irurtia Luis
Publication venue: 'Scitepress'
Publication date: 01/01/2021
Field of study

[EN] Synthetic simulated environments are gaining popularity in the Deep Learning Era, as they can alleviate the effort and cost of two critical tasks to build multi-camera systems for surveillance applications: setting up the camera system to cover the use cases and generating the labeled dataset to train the required Deep Neural Networks (DNNs). However, there are no simulated environments ready to solve them for all kind of scenarios and use cases. Typically, ‘ad hoc’ environments are built, which cannot be easily applied to other contexts. In this work we present a methodology to build synthetic simulated environments with sufficient generality to be usable in different contexts, with little effort. Our methodology tackles the challenges of the appropriate parameterization of scene configurations, the strategies to generate randomly a wide and balanced range of situations of interest for training DNNs with synthetic data, and the quick image capturing from virtual cameras considering the rendering bottlenecks. We show a practical implementation example for the detection of incorrectly placed luggage in aircraft cabins, including the qualitative and quantitative analysis of the data generation process and its influence in a DNN training, and the required modifications to adapt it to other surveillance contexts.This work has received funding from the Clean Sky 2 Joint Undertaking under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 865162, SmaCS (https://www.smacs.eu/

ZENODO

Archivo Digital para la Docencia y la Investigación

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Manipulating Attributes of Natural Scenes via Hallucination

Author: Akata Zeynep
Erdem Aykut
Erdem Erkut
Karacan Levent
Publication venue
Publication date: 01/01/2018
Field of study

In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly utilized in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a diverse set of transient attributes within a single model, eliminating the need of training multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic

arXiv.org e-Print Archive

MPG.PuRe

NASA patent abstracts bibliography: A continuing bibliography. Section 1: Abstracts (supplement 23)

Author
Publication venue
Publication date
Field of study

Abstracts are cited for 129 patents and patent applications introduced into the NASA scientific and technical information system during the period January 1983 through June 1983. Each entry consists of a citation, an abstract, and in most cases, a key illustration selected from the patent or patent application

NASA Technical Reports Server

DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection

Author: Feng Yongchao
Gao Yingjie
Huang Ziyue
Li Shiwei
Liu Qingjie
Wang Yunhong
Zhang Yanan
Publication venue
Publication date: 17/11/2023
Field of study

Though feature-alignment based Domain Adaptive Object Detection (DAOD) have achieved remarkable progress, they ignore the source bias issue, i.e. the aligned features are more favorable towards the source domain, leading to a sub-optimal adaptation. Furthermore, the presence of domain shift between the source and target domains exacerbates the problem of inconsistent classification and localization in general detection pipelines. To overcome these challenges, we propose a novel Distillation-based Unbiased Alignment (DUA) framework for DAOD, which can distill the source features towards a more balanced position via a pre-trained teacher model during the training process, alleviating the problem of source bias effectively. In addition, we design a Target-Relevant Object Localization Network (TROLN), which can mine target-related knowledge to produce two classification-free metrics (IoU and centerness). Accordingly, we implement a Domain-aware Consistency Enhancing (DCE) strategy that utilizes these two metrics to further refine classification confidences, achieving a harmonization between classification and localization in cross-domain scenarios. Extensive experiments have been conducted to manifest the effectiveness of this method, which consistently improves the strong baseline by large margins, outperforming existing alignment-based works.Comment: 10pages,5 figure

arXiv.org e-Print Archive

Multi-sensor Data Fusion Based on Belief Functions and Possibility Theory: Close Range Antipersonnel Mine Detection and Remote Sensing Mined Area Reduction

Author: Isabelle Bloch
Marc Acheroy
Nada Milisavljevic
Publication venue: 'IntechOpen'
Publication date: 01/02/2008
Field of study

IntechOpen

Grounding semantics in robots for Visual Question Answering

Author: Wahle Björn
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

End-to-end Trainable Ship Detection in SAR Images with Single Level Features

Author: Tiller Markus
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 01/06/2022
Field of study

Kongsberg Satellite Services (KSAT) use machine learning and manual analysis done by synthetic aperture radar (SAR) specialists on SAR images in real time to provide a ship detection service. KSATs current machine learning model has a limited ability to distinguish ships close to each other. For this reason, we aim to employ an end-to-end trainable object detection model, as they can better distinguish nearby objects, since they are not limited by heuristic post processing. Since heuristic post processing in object detection limit the models ability to distinguish ships close to each other, we investigate challenges related to employing an end-to-end trainable ship detection model. Since access to ground truth annotations in SAR images is limited, size and rotation labels are not available for all ships, and rotation labels are inaccurate. Since KSATs internal datasets are collected as part of a time critical operational service, position labels are not exact. Since existing evaluation metrics for object detection are too strict, they do not reflect user needs for this service. To tolerate missing size and rotation annotations, we base loss label assignment on the distance between objects instead of their IoU, and replace DIoU bounding box loss with a novel size regression loss named Size IoU (SIoU) combined with smooth L1 position loss. To tolerate inaccurate rotation labels, we propose angular direction vector (ADV) regression. To tolerate inaccurate position labels, the loss label assignment makes all predictions responsible for large overlapping regions instead of small disjoint regions. To compare models performance according to user needs, we propose an evaluation metric named Distance-AP (dAP), which is based on mAP, but replaces the IoU overlap threshold with an object center point distance threshold. To reduce duplicate ship predictions, we propose multi layer attention. Using the LS-SSDD SAR ship dataset, we find that replacing IoU based label assignment with position based label assignment increases dAP from 79% to 86%, and that replacing DIoU with SIoU decreases dAP by only 1%. Using a rotation regression benchmark where datasets have different amounts of rotation label noise, we find that ADV outperforms CSL in terms of mean predicted inaccuracy at all noise levels, and median predicted inaccuracy at high noise levels. Using an object detection benchmark where the datasets have varying amount of position label inaccuracy, we find that the proposed loss label assignment tolerates large amounts of noise without reduced performance. Using KSATs dataset of Sentinel 1 images, we measure 83% dAP. The proposed mechanisms allow effective training of a ship detection model, despite the missing size and rotation annotations, inaccurate position annotations, and inaccurate rotation annotations. We believe this is useful for KSATs ship detection service, as it can better distinguish nearby ships. However, more work is required to compare its performance with their existing solution. Source code is available at https://github.com/matill/Ship-detectio

Munin - Open Research Archive