4,968 research outputs found
Object Recognition from very few Training Examples for Enhancing Bicycle Maps
In recent years, data-driven methods have shown great success for extracting
information about the infrastructure in urban areas. These algorithms are
usually trained on large datasets consisting of thousands or millions of
labeled training examples. While large datasets have been published regarding
cars, for cyclists very few labeled data is available although appearance,
point of view, and positioning of even relevant objects differ. Unfortunately,
labeling data is costly and requires a huge amount of work. In this paper, we
thus address the problem of learning with very few labels. The aim is to
recognize particular traffic signs in crowdsourced data to collect information
which is of interest to cyclists. We propose a system for object recognition
that is trained with only 15 examples per class on average. To achieve this, we
combine the advantages of convolutional neural networks and random forests to
learn a patch-wise classifier. In the next step, we map the random forest to a
neural network and transform the classifier to a fully convolutional network.
Thereby, the processing of full images is significantly accelerated and
bounding boxes can be predicted. Finally, we integrate data of the Global
Positioning System (GPS) to localize the predictions on the map. In comparison
to Faster R-CNN and other networks for object recognition or algorithms for
transfer learning, we considerably reduce the required amount of labeled data.
We demonstrate good performance on the recognition of traffic signs for
cyclists as well as their localization in maps.Comment: Submitted to IV 2018. This research was supported by German Research
Foundation DFG within Priority Research Programme 1894 "Volunteered
Geographic Information: Interpretation, Visualization and Social Computing
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Reliable localization methods for intelligent vehicles based on environment perception
Mención Internacional en el título de doctorIn the near past, we would see autonomous vehicles and Intelligent Transport
Systems (ITS) as a potential future of transportation. Today, thanks to all the
technological advances in recent years, the feasibility of such systems is no longer a
question. Some of these autonomous driving technologies are already sharing our
roads, and even commercial vehicles are including more Advanced Driver-Assistance
Systems (ADAS) over the years. As a result, transportation is becoming more efficient
and the roads are considerably safer.
One of the fundamental pillars of an autonomous system is self-localization. An
accurate and reliable estimation of the vehicle’s pose in the world is essential to
navigation. Within the context of outdoor vehicles, the Global Navigation Satellite
System (GNSS) is the predominant localization system. However, these systems are
far from perfect, and their performance is degraded in environments with limited
satellite visibility. Additionally, their dependence on the environment can make them
unreliable if it were to change.
Accordingly, the goal of this thesis is to exploit the perception of the environment
to enhance localization systems in intelligent vehicles, with special attention to
their reliability. To this end, this thesis presents several contributions: First, a study
on exploiting 3D semantic information in LiDAR odometry is presented, providing
interesting insights regarding the contribution to the odometry output of each type
of element in the scene. The experimental results have been obtained using a public
dataset and validated on a real-world platform. Second, a method to estimate the
localization error using landmark detections is proposed, which is later on exploited
by a landmark placement optimization algorithm. This method, which has been
validated in a simulation environment, is able to determine a set of landmarks
so the localization error never exceeds a predefined limit. Finally, a cooperative
localization algorithm based on a Genetic Particle Filter is proposed to utilize vehicle
detections in order to enhance the estimation provided by GNSS systems. Multiple
experiments are carried out in different simulation environments to validate the
proposed method.En un pasado no muy lejano, los vehículos autónomos y los Sistemas Inteligentes
del Transporte (ITS) se veían como un futuro para el transporte con gran potencial.
Hoy, gracias a todos los avances tecnológicos de los últimos años, la viabilidad
de estos sistemas ha dejado de ser una incógnita. Algunas de estas tecnologías
de conducción autónoma ya están compartiendo nuestras carreteras, e incluso los
vehículos comerciales cada vez incluyen más Sistemas Avanzados de Asistencia a la
Conducción (ADAS) con el paso de los años. Como resultado, el transporte es cada
vez más eficiente y las carreteras son considerablemente más seguras.
Uno de los pilares fundamentales de un sistema autónomo es la autolocalización.
Una estimación precisa y fiable de la posición del vehículo en el mundo es esencial
para la navegación. En el contexto de los vehículos circulando en exteriores, el
Sistema Global de Navegación por Satélite (GNSS) es el sistema de localización predominante.
Sin embargo, estos sistemas están lejos de ser perfectos, y su rendimiento
se degrada en entornos donde la visibilidad de los satélites es limitada. Además, los
cambios en el entorno pueden provocar cambios en la estimación, lo que los hace
poco fiables en ciertas situaciones.
Por ello, el objetivo de esta tesis es utilizar la percepción del entorno para mejorar
los sistemas de localización en vehículos inteligentes, con una especial atención a
la fiabilidad de estos sistemas. Para ello, esta tesis presenta varias aportaciones:
En primer lugar, se presenta un estudio sobre cómo aprovechar la información
semántica 3D en la odometría LiDAR, generando una base de conocimiento sobre la
contribución de cada tipo de elemento del entorno a la salida de la odometría. Los
resultados experimentales se han obtenido utilizando una base de datos pública y se
han validado en una plataforma de conducción del mundo real. En segundo lugar,
se propone un método para estimar el error de localización utilizando detecciones
de puntos de referencia, que posteriormente es explotado por un algoritmo de
optimización de posicionamiento de puntos de referencia. Este método, que ha
sido validado en un entorno de simulación, es capaz de determinar un conjunto de
puntos de referencia para el cual el error de localización nunca supere un límite
previamente fijado. Por último, se propone un algoritmo de localización cooperativa
basado en un Filtro Genético de Partículas para utilizar las detecciones de vehículos
con el fin de mejorar la estimación proporcionada por los sistemas GNSS. El método
propuesto ha sido validado mediante múltiples experimentos en diferentes entornos
de simulación.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridSecretario: Joshué Manuel Pérez Rastelli.- Secretario: Jorge Villagrá Serrano.- Vocal: Enrique David Martí Muño
Perception and intelligent localization for autonomous driving
Mestrado em Engenharia de Computadores e TelemáticaVisão por computador e fusão sensorial são temas relativamente recentes, no entanto largamente adoptados no desenvolvimento de robôs autónomos que exigem adaptabilidade ao seu ambiente envolvente. Esta dissertação foca-se numa abordagem a estes dois temas para alcançar percepção no contexto de condução autónoma. O uso de câmaras para atingir este fim é um
processo bastante complexo. Ao contrário dos meios sensoriais clássicos que fornecem sempre o mesmo tipo de informação precisa e atingida de forma determinística, as sucessivas imagens adquiridas por uma câmara estão repletas
da mais variada informação e toda esta ambígua e extremamente difícil de extrair. A utilização de câmaras como meio sensorial em robótica
é o mais próximo que chegamos na semelhança com aquele que é o de maior importância no processo de percepção humana, o sistema de visão. Visão por computador é uma disciplina científica que engloba àreas como: processamento
de sinal, inteligência artificial, matemática, teoria de controlo, neurobiologia e física.
A plataforma de suporte ao estudo desenvolvido no âmbito desta dissertação é o ROTA (RObô Triciclo Autónomo) e todos os elementos que consistem
o seu ambiente. No contexto deste, são descritas abordagens que foram introduzidas com fim de desenvolver soluções para todos os desafios que o
robô enfrenta no seu ambiente: detecção de linhas de estrada e consequente percepção desta, detecção de obstáculos, semáforos, zona da passadeira e zona de obras. É também descrito um sistema de calibração e aplicação da remoção da perspectiva da imagem, desenvolvido de modo a mapear os elementos percepcionados em distâncias reais. Em consequência do sistema
de percepção, é ainda abordado o desenvolvimento de auto-localização integrado
numa arquitectura distribuída incluindo navegação com planeamento inteligente. Todo o trabalho desenvolvido no decurso da dissertação é essencialmente centrado no desenvolvimento de percepção robótica no contexto de condução autónoma.Computer vision and sensor fusion are subjects that are quite recent, however widely adopted in the development of autonomous robots that require
adaptability to their surrounding environment. This thesis gives an approach on both in order to achieve perception in the scope of autonomous driving.
The use of cameras to achieve this goal is a rather complex subject.
Unlike the classic sensorial devices that provide the same type of information with precision and achieve this in a deterministic way, the successive
images acquired by a camera are replete with the most varied information, that this ambiguous and extremely dificult to extract. The use of cameras
for robotic sensing is the closest we got within the similarities with what is of most importance in the process of human perception, the vision system. Computer vision is a scientific discipline that encompasses areas such as signal processing, artificial intelligence, mathematics, control theory,
neurobiology and physics.
The support platform in which the study within this thesis was developed, includes ROTA (RObô Triciclo Autónomo) and all elements comprising its
environment. In its context, are described approaches that introduced in the platform in order to develop solutions for all the challenges facing the robot in its environment: detection of lane markings and its consequent perception, obstacle detection, trafic lights, crosswalk and road maintenance area. It is also described a calibration system and implementation for the removal of the image perspective, developed in order to map the
elements perceived in actual real world distances. As a result of the perception system development, it is also addressed self-localization integrated in
a distributed architecture that allows navigation with long term planning.
All the work developed in the course of this work is essentially focused on robotic perception in the context of autonomous driving
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Cooperative Road Sign and Traffic Light Using Near Infrared Identification and Zigbee Smartdust Technologies
Vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I as well as I2V)applications are developing very fast. They rely on telecommunication and localizationtechnologies to detect, identify and geo-localize the sources of information (such as vehicles,roadside objects, or pedestrians). This paper presents an original approach on how twodifferent technologies (a near infrared identification sensor and a Zigbee smartdust sensor)can work together in order to create an improved system. After an introduction of these twosensors, two concrete applications will be presented: a road sign detection application and acooperative traffic light application. These applications show how the coupling of the twosensors enables robust detection and how they complement each other to add dynamicinformation to road-side objects
- …