Search CORE

46 research outputs found

Recommended from our members

Learning Birds-Eye View Representations for Autonomous Driving

Author: Roddick Thomas
Publication venue: University of Cambridge
Publication date: 19/03/2021
Field of study

Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving. The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera. In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ

Apollo (Cambridge)

Semantic Mapping of Road Scenes

Author: Sengupta S
Publication venue: 'Oxford Brookes University'
Publication date: 01/01/2014
Field of study

The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

Oxford Brookes University: RADAR

Vision based trail detection for all-terrain robots

Author: Alves Nelson Miguel Rosa
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2010
Field of study

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de ComputadoresEsta dissertação propõe um modelo para detecção de trilhos baseado na observação de que estes são estruturas salientes no campo visual do robô. Devido à complexidade dos ambientes naturais, uma aplicação directa dos modelos tradicionais de saliência visual não é suficientemente robusta para prever a localização dos trilhos. Tal como noutras tarefas de detecção, a robustez pode ser aumentada através da modulação da computação da saliência com conhecimento implícito acerca das características visuais (e.g. cor) que permitem uma melhor representação do objecto a encontrar. Esta dissertação propõe o uso da estrutura global do objecto, sendo esta uma característica mais estável e previsível para o caso de trilhos naturais. Esta nova componente de conhecimento implícito é especificada em termos de regras de percepção activa, que controlam o comportamento de agentes simples que se comportam em conjunto para computar o mapa de saliência da imagem de entrada. Para o propósito de acumulação de informação histórica acerca da localização do trilho é utilizado um campo neuronal dinâmico com compensação de movimento. Resultados experimentais num conjunto de dados vasto revelam a habilidade do modelo de produzir uma taxa de sucesso de 91% a 20Hz. O modelo demonstra ser robusto em situações onde outros detectores falhariam, tal como quando o trilho não emerge da parte de baixo da imagem, ou quando se encontra consideravelmente interrompido

Repositório da Universidade Nova de Lisboa

Paikkatieto-ohjattu tien automaattinen segmentointi käyttäen itseohjattuja piirteitä talviolosuhteissa

Author: Alamikkotervo Eerik
Publication venue
Publication date: 21/08/2023
Field of study

Road segmentation is a critical task for enabling the safe operation of autonomous cars. Currently, most road segmentation models rely on manually labeled data (supervised learning), making the training process very resource-heavy. Also, there is a lack of data in adverse conditions like winter and supervised models generalize poorly to domains they have not seen during training. In this thesis, a road segmentation model that requires no manual labeling (self-supervised learning) is presented. Training and testing are conducted in challenging winter driving conditions but the method can be adapted to any domain with no modifications. The proposed method includes two parts: Position-Aided Road Auto-labeling with Self-supervised features (PARAS) and learning from the PARAS auto-labels. In PARAS auto-labeling, the driven area is extracted based on Global Navigation Satellite System (GNSS) poses, and then the rest of the road is collected by comparing the mean similarity to the driven area with a pre-trained self-supervised feature extractor. Then a segmentation model is trained with the autogenerated labels using a custom loss function. The proposed method improves the current state of the art of self-supervised road segmentation in the winter driving domain (74.8 IoU vs 73.0 IoU) but can't yet compete with supervised methods. Most of the error in our method is caused by the inability to collect all road pixels by comparing feature similarity with the driven area. Performance could be increased by using a more accurate feature extractor or more advanced similarity metric than the simple mean that is used here. The scalability of our proposed model is excellent as only GNSS and camera sensors are required and it avoids the label-assigning problem that is present in other approaches that utilize self-supervised features. In the current work, labels are assigned by simply clustering similar features together or using manually labeled data to learn projection from features to classes.Tien tunnistus pikselitasolla, eli segmentointi, kaikissa olosuhteissa on edellytys turvalliselle autonomiselle ajamiselle. Tällä hetkellä suurin osa tien tunnistusmenetelmistä nojaa manuaalisesti merkattuun koulutusdataan (ohjattu oppiminen) ja datan merkkaaminen on hyvin aikaavievää prosessi. Lisäksi koulutusdataa ei löydy juurikaan poikkeavissa olosuhteissa, kuten talvella, ja mallit toimivat heikosti olosuhteissa, joissa niitä ei ole koulutettu. Tässä työssä esitetään tien tunnistusmenetelmä, joka ei tarvitse lainkaan manuaalisia merkintöjä(itse-ohjattu oppiminen). Koulutus ja testaus suoritetaan vaativissa talviolosuhteissa, mutta menetelmää voidaan soveltaa mihin tahansa olosuhteisiin ilman muutoksia. Menetelmässä on kaksi osaa: automaattinen tiemerkintöjen luonti itsekehitetyllä PARAS (Position-Aided Road Auto-labeling with Self-supervised features)-menetelmällä ja tiensegmentointimallin kouluttaminen näillä merkinnöillä. PARAS-menetelmällä tiemerkinnät luodaan automaattisesti erottelemalla ajettu alue sateellittipaikannauksen (GNSS) avulla ja vertaamalla muiden kuva-alueiden samankaltaisuutta ajettuun alueeseen itseohjatulla piirteiden erotus mallilla. Tien tunnistusmalli voidaan sitten kouluttaa tällä autogeneroidulla datalla. Autogeneroitujen merkintöjen heikkouksia kompensoidaan itsekehitellyllä hukkafunktiolla. Menetelmä parantaa nykyistä itseohjatun tientunnistuksen tasoa talviolosuhteissa (74.8 IoU vs 73.0 IoU) mutta ei vielä pärjää ohjatuille menetelmille. Vaihe, jossa koko tie etsitään vertaamalla muiden alueiden piirteitä ajetun alueen piirteisiin, on menetelmän merkittävin virhelähde. Tarkkuuta voidaan parantaa käyttämällä parempaa piirteiden erotus mallia tai vertaamalla samankaltaisuutta kehittyneemmällä tavalla kuin keskiarvo, jota on käytetty tässä työssä. Menetelmän skaalautuvuus on erinomainen, sillä vaatimuksena on ainoastaan kamera ja GNSS-paikannusanturi. Lisäksi muissa itseohjatuissa malleissa esiintyvä ongelma luoda tiemerkinnät piirteiden perusteella ratkaistaan käyttämällä hyväksi GNNS-paikkatietoa. Olemassa olevissa ratkaisuissa tiemerkinnät täytyy luoda joko yksinkertaisesti ryhmittämällä samankaltaiset piirteet yhteen tai kouluttamalla manuaalisesti merkatulla datalla projektio piirteistä luokkamerkintöihin

Aaltodoc Publication Archive

Vision-based ego-lane analysis system : dataset and algorithms

Author: Berriel Rodrigo Ferreira
Publication venue: Mestrado em Informática
Publication date: 03/08/2016
Field of study

A detecção e análise da faixa de trânsito são tarefas importantes e desafiadoras em sistemas avançados de assistência ao motorista e direção autônoma. Essas tarefas são necessárias para auxiliar veículos autônomos e semi-autônomos a operarem com segurança. A queda no custo dos sensores de visão e os avanços em hardware embarcado impulsionaram as pesquisas relacionadas a faixa de trânsito –detecção, estimativa, rastreamento, etc. – nas últimas duas décadas. O interesse nesse tópico aumentou ainda mais com a demanda por sistemas avançados de assistência ao motorista (ADAS) e carros autônomos. Embora amplamente estudado de forma independente, ainda há necessidade de estudos que propõem uma solução combinada para os vários problemas relacionados a faixa do veículo, tal como aviso de saída de faixa (LDW), detecção de troca de faixa, classificação do tipo de linhas de divisão de fluxo (LMT), detecção e classificação de inscrições no pavimento, e detecção da presença de faixas ajdacentes. Esse trabalho propõe um sistema de análise da faixa do veículo (ELAS) em tempo real capaz de estimar a posição da faixa do veículo, classificar as linhas de divisão de fluxo e inscrições na faixa, realizar aviso de saída de faixa e detectar eventos de troca de faixa. O sistema proposto, baseado em visão, funciona em uma sequência temporal de imagens. Características das marcações de faixa são extraídas tanto na perspectiva original quanto em images mapeadas para a vista aérea, que então são combinadas para aumentar a robustez. A estimativa final da faixa é modelada como uma spline usando uma combinação de métodos (linhas de Hough, filtro de Kalman e filtro de partículas). Baseado na faixa estimada, todos os outros eventos são detectados. Além disso, o sistema proposto foi integrado para experimentação em um sistema para carros autônomos que está sendo desenvolvido pelo Laboratório de Computação de Alto Desempenho (LCAD) da Universidade Federal do Espírito Santo (UFES). Para validar os algorítmos propostos e cobrir a falta de base de dados para essas tarefas na literatura, uma nova base dados com mais de 20 cenas diferentes (com mais de 15.000 imagens) e considerando uma variedade de cenários (estrada urbana, rodovias, tráfego, sombras, etc.) foi criada. Essa base de dados foi manualmente anotada e disponilizada publicamente para possibilitar a avaliação de diversos eventos que são de interesse para a comunidade de pesquisa (i.e. estimativa, mudança e centralização da faixa; inscrições no pavimento; cruzamentos; tipos de linhas de divisão de fluxo; faixas de pedestre e faixas adjacentes). Além disso, o sistema também foi validado qualitativamente com base na integração com o veículo autônomo. O sistema alcançou altas taxas de detecção em todos os eventos do mundo real e provou estar pronto para aplicações em tempo real.Lane detection and analysis are important and challenging tasks in advanced driver assistance systems and autonomous driving. These tasks are required in order to help autonomous and semi-autonomous vehicles to operate safely. Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research – detection, estimation, tracking, etc. – in the past two decades. The interest in this topic has increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes presence. This work proposes a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines, Kalman filter and Particle filter). Based on the estimated lane, all other events are detected. Moreover, the proposed system was integrated for experimentation into an autonomous car that is being developed by the High Performance Computing Laboratory of the Universidade Federal do Espírito Santo. To validate the proposed algorithms and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e. lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). Furthermore, the system was also validated qualitatively based on the integration with the autonomous vehicle. ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.FAPE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da Universidade Federal do Espirito Santo