692 research outputs found

    FieldSAFE: Dataset for Obstacle Detection in Agriculture

    Full text link
    In this paper, we present a novel multi-modal dataset for obstacle detection in agriculture. The dataset comprises approximately 2 hours of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016. Sensing modalities include stereo camera, thermal camera, web camera, 360-degree camera, lidar, and radar, while precise localization is available from fused IMU and GNSS. Both static and moving obstacles are present including humans, mannequin dolls, rocks, barrels, buildings, vehicles, and vegetation. All obstacles have ground truth object labels and geographic coordinates.Comment: Submitted to special issue of MDPI Sensors: Sensors in Agricultur

    Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

    Full text link
    Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition (CVPR), 201

    On Video Analysis of Omnidirectional Bee Traffic: Counting Bee Motions with Motion Detection and Image Classification

    Get PDF
    Omnidirectional bee traffic is the number of bees moving in arbitrary directions in close proximity to the landing pad of a given hive over a given period of time. Video bee traffic analysis has the potential to automate the assessment of omnidirectional bee traffic levels, which, in turn, may lead to a complete or partial automation of honeybee colony health assessment. In this investigation, we proposed, implemented, and partially evaluated a two-tier method for counting bee motions to estimate levels of omnidirectional bee traffic in bee traffic videos. Our method couples motion detection with image classification so that motion detection acts as a class-agnostic object location method that generates a set of regions with possible objects and each such region is classified by a class-specific classifier such as a convolutional neural network or a support vector machine or an ensemble of classifiers such as a random forest. The method has been, and is being iteratively field tested in BeePi monitors, multi-sensor electronic beehive monitoring systems, installed on live Langstroth beehives in real apiaries. Deployment of a BeePi monitor on top of a beehive does not require any structural modification of the beehive’s woodenware, and is not disruptive to natural beehive cycles. To ensure the replicability of the reported findings and to provide a performance benchmark for interested research communities and citizen scientists, we have made public our curated and labeled image datasets of 167,261 honeybee images and our omnidirectional bee traffic videos used in this investigation

    SIGS: Synthetic Imagery Generating Software for the development and evaluation of vision-based sense-and-avoid systems

    Get PDF
    Unmanned Aerial Systems (UASs) have recently become a versatile platform for many civilian applications including inspection, surveillance and mapping. Sense-and-Avoid systems are essential for the autonomous safe operation of these systems in non-segregated airspaces. Vision-based Sense-and-Avoid systems are preferred to other alternatives as their price, physical dimensions and weight are more suitable for small and medium-sized UASs, but obtaining real flight imagery of potential collision scenarios is hard and dangerous, which complicates the development of Vision-based detection and tracking algorithms. For this purpose, user-friendly software for synthetic imagery generation has been developed, allowing to blend user-defined flight imagery of a simulated aircraft with real flight scenario images to produce realistic images with ground truth annotations. These are extremely useful for the development and benchmarking of Vision-based detection and tracking algorithms at a much lower cost and risk. An image processing algorithm has also been developed for automatic detection of the occlusions caused by certain parts of the UAV which carries the camera. The detected occlusions can later be used by our software to simulate the occlusions due to the UAV that would appear in a real flight with the same camera setup. Additionally this algorithm could be used to mask out pixels which do not contain relevant information of the scene for the visual detection, making the image search process more efficient. Finally an application example of the imagery obtained with our software for the benchmarking of a state-of-art visual tracker is presented

    Data-centric Design and Training of Deep Neural Networks with Multiple Data Modalities for Vision-based Perception Systems

    Get PDF
    224 p.Los avances en visión artificial y aprendizaje automático han revolucionado la capacidad de construir sistemas que procesen e interpreten datos digitales, permitiéndoles imitar la percepción humana y abriendo el camino a un amplio rango de aplicaciones. En los últimos años, ambas disciplinas han logrado avances significativos,impulsadas por los progresos en las técnicas de aprendizaje profundo(deep learning). El aprendizaje profundo es una disciplina que utiliza redes neuronales profundas (DNNs, por sus siglas en inglés) para enseñar a las máquinas a reconocer patrones y hacer predicciones basadas en datos. Los sistemas de percepción basados en el aprendizaje profundo son cada vez más frecuentes en diversos campos, donde humanos y máquinas colaboran para combinar sus fortalezas.Estos campos incluyen la automoción, la industria o la medicina, donde mejorar la seguridad, apoyar el diagnóstico y automatizar tareas repetitivas son algunos de los objetivos perseguidos.Sin embargo, los datos son uno de los factores clave detrás del éxito de los algoritmos de aprendizaje profundo. La dependencia de datos limita fuertemente la creación y el éxito de nuevas DNN. La disponibilidad de datos de calidad para resolver un problema específico es esencial pero difícil de obtener, incluso impracticable,en la mayoría de los desarrollos. La inteligencia artificial centrada en datos enfatiza la importancia de usar datos de alta calidad que transmitan de manera efectiva lo que un modelo debe aprender. Motivada por los desafíos y la necesidad de los datos, esta tesis formula y valida cinco hipótesis sobre la adquisición y el impacto de los datos en el diseño y entrenamiento de las DNNs.Específicamente, investigamos y proponemos diferentes metodologías para obtener datos adecuados para entrenar DNNs en problemas con acceso limitado a fuentes de datos de gran escala. Exploramos dos posibles soluciones para la obtención de datos de entrenamiento, basadas en la generación de datos sintéticos. En primer lugar, investigamos la generación de datos sintéticos utilizando gráficos 3D y el impacto de diferentes opciones de diseño en la precisión de los DNN obtenidos. Además, proponemos una metodología para automatizar el proceso de generación de datos y producir datos anotados variados, mediante la replicación de un entorno 3D personalizado a partir de un archivo de configuración de entrada. En segundo lugar, proponemos una red neuronal generativa(GAN) que genera imágenes anotadas utilizando conjuntos de datos anotados limitados y datos sin anotaciones capturados en entornos no controlados
    corecore