29 research outputs found
{Pix2NeRF}: {U}nsupervised Conditional -{GAN} for Single Image to Neural Radiance Fields Translation
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
Deep learning & remote sensing : pushing the frontiers in image segmentation
Dissertação (Mestrado em Informática) — Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Brasília, 2022.A segmentação de imagens visa simplificar o entendimento de imagens digitais e métodos de
aprendizado profundo usando redes neurais convolucionais permitem a exploração de diferentes
tarefas (e.g., segmentação semântica, instância e panóptica). A segmentação semântica atribui
uma classe a cada pixel em uma imagem, a segmentação de instância classifica objetos a nível
de pixel com um identificador exclusivo para cada alvo e a segmentação panóptica combina
instâncias com diferentes planos de fundo. Os dados de sensoriamento remoto são muito adequados para desenvolver novos algoritmos. No entanto, algumas particularidades impedem que o
sensoriamento remoto com imagens orbitais e aéreas cresça quando comparado às imagens tradicionais (e.g., fotos de celulares): (1) as imagens são muito extensas, (2) apresenta características
diferentes (e.g., número de canais e formato de imagem), (3) um grande número de etapas de préprocessamento e pós-processamento (e.g., extração de quadros e classificação de cenas grandes) e
(4) os softwares para rotulagem e treinamento de modelos não são compatíveis. Esta dissertação
visa avançar nas três principais categorias de segmentação de imagens. Dentro do domínio de
segmentação de instâncias, propusemos três experimentos. Primeiro, aprimoramos a abordagem
de segmentação de instância baseada em caixa para classificar cenas grandes. Em segundo
lugar, criamos um método sem caixas delimitadoras para alcançar resultados de segmentação
de instâncias usando modelos de segmentação semântica em um cenário com objetos esparsos.
Terceiro, aprimoramos o método anterior para cenas aglomeradas e desenvolvemos o primeiro
estudo considerando aprendizado semissupervisionado usando sensoriamento remoto e dados
GIS. Em seguida, no domínio da segmentação panóptica, apresentamos o primeiro conjunto de
dados de segmentação panóptica de sensoriamento remoto e dispomos de uma metodologia para
conversão de dados GIS no formato COCO. Como nosso primeiro estudo considerou imagens
RGB, estendemos essa abordagem para dados multiespectrais. Por fim, melhoramos o método
box-free inicialmente projetado para segmentação de instâncias para a tarefa de segmentação
panóptica. Esta dissertação analisou vários métodos de segmentação e tipos de imagens, e as
soluções desenvolvidas permitem a exploração de novas tarefas , a simplificação da rotulagem
de dados e uma forma simplificada de obter previsões de instância e panópticas usando modelos
simples de segmentação semântica.Image segmentation aims to simplify the understanding of digital images. Deep learning-based
methods using convolutional neural networks have been game-changing, allowing the exploration
of different tasks (e.g., semantic, instance, and panoptic segmentation). Semantic segmentation
assigns a class to every pixel in an image, instance segmentation classifies objects at a pixel
level with a unique identifier for each target, and panoptic segmentation combines instancelevel predictions with different backgrounds. Remote sensing data largely benefits from those
methods, being very suitable for developing new DL algorithms and creating solutions using
top-view images. However, some peculiarities prevent remote sensing using orbital and aerial
imagery from growing when compared to traditional ground-level images (e.g., camera photos):
(1) The images are extensive, (2) it presents different characteristics (e.g., number of channels
and image format), (3) a high number of pre-processes and post-processes steps (e.g., extracting
patches and classifying large scenes), and (4) most open software for labeling and deep learning applications are not friendly to remote sensing due to the aforementioned reasons. This
dissertation aimed to improve all three main categories of image segmentation. Within the instance segmentation domain, we proposed three experiments. First, we enhanced the box-based
instance segmentation approach for classifying large scenes, allowing practical pipelines to be
implemented. Second, we created a bounding-box free method to reach instance segmentation
results by using semantic segmentation models in a scenario with sparse objects. Third, we
improved the previous method for crowded scenes and developed the first study considering
semi-supervised learning using remote sensing and GIS data. Subsequently, in the panoptic
segmentation domain, we presented the first remote sensing panoptic segmentation dataset containing fourteen classes and disposed of software and methodology for converting GIS data into
the panoptic segmentation format. Since our first study considered RGB images, we extended
our approach to multispectral data. Finally, we leveraged the box-free method initially designed
for instance segmentation to the panoptic segmentation task. This dissertation analyzed various
segmentation methods and image types, and the developed solutions enable the exploration of
new tasks (such as panoptic segmentation), the simplification of labeling data (using the proposed semi-supervised learning procedure), and a simplified way to obtain instance and panoptic
predictions using simple semantic segmentation models
Visual place recognition for improved open and uncertain navigation
Visual place recognition localises a query place image by comparing it against a reference database of known place images, a fundamental element of robotic navigation.
Recent work focuses on using deep learning to learn image descriptors for this task
that are invariant to appearance changes from dynamic lighting, weather and seasonal
conditions. However, these descriptors: require greater computational resources than
are available on robotic hardware, have few SLAM frameworks designed to utilise
them, return a relative comparison between image descriptors which is difficult to interpret, cannot be used for appearance invariance in other navigation tasks such as
scene classification and are unable to identify query images from an open environment that have no true match in the reference database. This thesis addresses these
challenges with three contributions. The first is a lightweight visual place recognition
descriptor combined with a probabilistic filter to address a subset of the visual SLAM
problem in real-time. The second contribution combines visual place recognition and
scene classification for appearance invariant scene classification, which is extended
to recognise unknown scene classes when navigating an open environment. The final contribution uses comparisons between query and reference image descriptors to
classify whether they result in a true, or false positive localisation and whether a true
match for the query image exists in the reference database.Edinburgh Centre for Robotics and Engineering and Physical Sciences Research Council (EPSRC) fundin