Search CORE

225 research outputs found

Cross Pixel Optical Flow Similarity for Self-Supervised Learning

Author: A Dosovitskiy
A Mahendran
A Owens
D Todorovic
DJ Butler
I Misra
M Noroozi
N Cristianini
O Russakovsky
R Gao
R Zhang
Publication venue
Publication date: 15/07/2018
Field of study

We propose a novel method for learning convolutional neural image representations without manual supervision. We use motion cues in the form of optical flow, to supervise representations of static images. The obvious approach of training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose a much simpler learning goal: embed pixels such that the similarity between their embeddings matches that between their optical flow vectors. At test time, the learned deep network can be used without access to video or flow information and transferred to tasks such as image classification, detection, and segmentation. Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision in general, and is overall state of the art in self-supervised pretraining for semantic image segmentation, as demonstrated on standard benchmarks

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms

Author: Qin Rongjun
Song Shuang
Xu Ningli
Publication venue
Publication date: 14/02/2023
Field of study

Recent advances in computer vision and deep learning have shown promising performance in estimating rigid/similarity transformation between unregistered point clouds of complex objects and scenes. However, their performances are mostly evaluated using a limited number of datasets from a single sensor (e.g. Kinect or RealSense cameras), lacking a comprehensive overview of their applicability in photogrammetric 3D mapping scenarios. In this work, we provide a comprehensive review of the state-of-the-art (SOTA) point cloud registration methods, where we analyze and evaluate these methods using a diverse set of point cloud data from indoor to satellite sources. The quantitative analysis allows for exploring the strengths, applicability, challenges, and future trends of these methods. In contrast to existing analysis works that introduce point cloud registration as a holistic process, our experimental analysis is based on its inherent two-step process to better comprehend these approaches including feature/keypoint-based initial coarse registration and dense fine registration through cloud-to-cloud (C2C) optimization. More than ten methods, including classic hand-crafted, deep-learning-based feature correspondence, and robust C2C methods were tested. We observed that the success rate of most of the algorithms are fewer than 40% over the datasets we tested and there are still are large margin of improvement upon existing algorithms concerning 3D sparse corresopondence search, and the ability to register point clouds with complex geometry and occlusions. With the evaluated statistics on three datasets, we conclude the best-performing methods for each step and provide our recommendations, and outlook future efforts.Comment: 7 figure

arXiv.org e-Print Archive

The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection

Author: Bampis Loukas
Gasteratos Antonios
Tsintotas Konstantinos A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2022
Field of study

Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system's structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.Comment: 25 pages, 15 figure

arXiv.org e-Print Archive

Brain Tumor Detection and Segmentation in Multisequence MRI

Author: Dvořák Pavel
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2015
Field of study

Tato práce se zabývá detekcí a segmentací mozkového nádoru v multisekvenčních MR obrazech se zaměřením na gliomy vysokého a nízkého stupně malignity. Jsou zde pro tento účel navrženy tři metody. První metoda se zabývá detekcí prezence částí mozkového nádoru v axiálních a koronárních řezech. Jedná se o algoritmus založený na analýze symetrie při různých rozlišeních obrazu, který byl otestován na T1, T2, T1C a FLAIR obrazech. Druhá metoda se zabývá extrakcí oblasti celého mozkového nádoru, zahrnující oblast jádra tumoru a edému, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkový nádor z 2D i 3D obrazů. Je zde opět využita analýza symetrie, která je následována automatickým stanovením intenzitního prahu z nejvíce asymetrických částí. Třetí metoda je založena na predikci lokální struktury a je schopna segmentovat celou oblast nádoru, jeho jádro i jeho aktivní část. Metoda využívá faktu, že většina lékařských obrazů vykazuje vysokou podobnost intenzit sousedních pixelů a silnou korelaci mezi intenzitami v různých obrazových modalitách. Jedním ze způsobů, jak s touto korelací pracovat a používat ji, je využití lokálních obrazových polí. Podobná korelace existuje také mezi sousedními pixely v anotaci obrazu. Tento příznak byl využit v predikci lokální struktury při lokální anotaci polí. Jako klasifikační algoritmus je v této metodě použita konvoluční neuronová síť vzhledem k její známe schopnosti zacházet s korelací mezi příznaky. Všechny tři metody byly otestovány na veřejné databázi 254 multisekvenčních MR obrazech a byla dosáhnuta přesnost srovnatelná s nejmodernějšími metodami v mnohem kratším výpočetním čase (v řádu sekund při použitý CPU), což poskytuje možnost manuálních úprav při interaktivní segmetaci.This work deals with the brain tumor detection and segmentation in multisequence MR images with particular focus on high- and low-grade gliomas. Three methods are propose for this purpose. The first method deals with the presence detection of brain tumor structures in axial and coronal slices. This method is based on multi-resolution symmetry analysis and it was tested for T1, T2, T1C and FLAIR images. The second method deals with extraction of the whole brain tumor region, including tumor core and edema, in FLAIR and T2 images and is suitable to extract the whole brain tumor region from both 2D and 3D. It also uses the symmetry analysis approach which is followed by automatic determination of the intensity threshold from the most asymmetric parts. The third method is based on local structure prediction and it is able to segment the whole tumor region as well as tumor core and active tumor. This method takes the advantage of a fact that most medical images feature a high similarity in intensities of nearby pixels and a strong correlation of intensity profiles across different image modalities. One way of dealing with -- and even exploiting -- this correlation is the use of local image patches. In the same way, there is a high correlation between nearby labels in image annotation, a feature that has been used in the ``local structure prediction'' of local label patches. Convolutional neural network is chosen as a learning algorithm, as it is known to be suited for dealing with correlation between features. All three methods were evaluated on a public data set of 254 multisequence MR volumes being able to reach comparable results to state-of-the-art methods in much shorter computing time (order of seconds running on CPU) providing means, for example, to do online updates when aiming at an interactive segmentation.

Digital library of Brno University of Technology

National Repository of Grey Literature

A Computational Framework for Learning from Complex Data: Formulations, Algorithms, and Applications

Author: Zhang Wenlu
Publication venue: ODU Digital Commons
Publication date: 01/07/2016
Field of study

Many real-world processes are dynamically changing over time. As a consequence, the observed complex data generated by these processes also evolve smoothly. For example, in computational biology, the expression data matrices are evolving, since gene expression controls are deployed sequentially during development in many biological processes. Investigations into the spatial and temporal gene expression dynamics are essential for understanding the regulatory biology governing development. In this dissertation, I mainly focus on two types of complex data: genome-wide spatial gene expression patterns in the model organism fruit fly and Allen Brain Atlas mouse brain data. I provide a framework to explore spatiotemporal regulation of gene expression during development. I develop evolutionary co-clustering formulation to identify co-expressed domains and the associated genes simultaneously over different temporal stages using a mesh-generation pipeline. I also propose to employ the deep convolutional neural networks as a multi-layer feature extractor to generate generic representations for gene expression pattern in situ hybridization (ISH) images. Furthermore, I employ the multi-task learning method to fine-tune the pre-trained models with labeled ISH images. My proposed computational methods are evaluated using synthetic data sets and real biological data sets including the gene expression data from the fruit fly BDGP data sets and Allen Developing Mouse Brain Atlas in comparison with baseline existing methods. Experimental results indicate that the proposed representations, formulations, and methods are efficient and effective in annotating and analyzing the large-scale biological data sets

Old Dominion University