Search CORE

144 research outputs found

Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets

Author: Chen Nengcheng
Jiang Yuyao
Lu Xingyue
Qiu Huanbin
Qu Hao
Zeng Xiaoru
Zheng Maoteng
Zhu Junfeng
Publication venue
Publication date: 13/08/2023
Field of study

We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the exact LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-the-art pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributed computing system.Comment: camera ready version for ICCV202

arXiv.org e-Print Archive

Object Detection: Current and Future Directions

Author: Javier Ruiz-del-Solar
Rodrigo Verschae
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Frontiers - Publisher Connector

RADA: Robust Adversarial Data Augmentation for Camera Localization in Challenging Conditions

Author: Lu Chris Xiaoxuan
Markham Andrew
Saputra Muhamad Risqi U.
Trigoni Niki
Wang Jialu
Publication venue
Publication date: 13/12/2023
Field of study

Camera localization is a fundamental problem for many applications in computer vision, robotics, and autonomy. Despite recent deep learning-based approaches, the lack of robustness in challenging conditions persists due to changes in appearance caused by texture-less planes, repeating structures, reflective surfaces, motion blur, and illumination changes. Data augmentation is an attractive solution, but standard image perturbation methods fail to improve localization robustness. To address this, we propose RADA, which concentrates on perturbing the most vulnerable pixels to generate relatively less image perturbations that perplex the network. Our method outperforms previous augmentation techniques, achieving up to twice the accuracy of state-of-the-art models even under ’unseen’ challenging weather conditions. Videos of our results can be found at https://youtu.be/niOv7- fJeCA. The source code for RADA is publicly available at https://github.com/jialuwang123321/RAD

Edinburgh Research Explorer

A Temporal Sequence Learning for Action Recognition and Prediction

Author: Cho Sangwoo
Foroosh Hassan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2018
Field of study

In this work\footnote {This work was supported in part by the National Science Foundation under grant IIS-1212948.}, we present a method to represent a video with a sequence of words, and learn the temporal sequencing of such words as the key information for predicting and recognizing human actions. We leverage core concepts from the Natural Language Processing (NLP) literature used in sentence classification to solve the problems of action prediction and action recognition. Each frame is converted into a word that is represented as a vector using the Bag of Visual Words (BoW) encoding method. The words are then combined into a sentence to represent the video, as a sentence. The sequence of words in different actions are learned with a simple but effective Temporal Convolutional Neural Network (T-CNN) that captures the temporal sequencing of information in a video sentence. We demonstrate that a key characteristic of the proposed method is its low-latency, i.e. its ability to predict an action accurately with a partial sequence (sentence). Experiments on two datasets, \textit{UCF101} and \textit{HMDB51} show that the method on average reaches 95\% of its accuracy within half the video frames. Results, also demonstrate that our method achieves compatible state-of-the-art performance in action recognition (i.e. at the completion of the sentence) in addition to action prediction.Comment: 10 pages, 8 figures, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Motion Compensation

Author: Cai Hanyu
Ou Ni
Wang Junzheng
Publication venue
Publication date: 18/04/2023
Field of study

This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-assisted LiDAR motion compensation modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR motion compensation, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.Comment: 7 pages, 7 figures, 31 reference

arXiv.org e-Print Archive

Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images

Author: Doretto Gianfranco
Gu Yu
Mera-Trujillo Marcela
Patel Shivang
Publication venue
Publication date: 02/06/2023
Field of study

Keypoint detection and matching is a fundamental task in many computer vision problems, from shape reconstruction, to structure from motion, to AR/VR applications and robotics. It is a well-studied problem with remarkable successes such as SIFT, and more recent deep learning approaches. While great robustness is exhibited by these techniques with respect to noise, illumination variation, and rigid motion transformations, less attention has been placed on image distortion sensitivity. In this work, we focus on the case when this is caused by the geometry of the cameras used for image acquisition, and consider the keypoint detection and matching problem between the hybrid scenario of a fisheye and a projective image. We build on a state-of-the-art approach and derive a self-supervised procedure that enables training an interest point detector and descriptor network. We also collected two new datasets for additional training and testing in this unexplored scenario, and we demonstrate that current approaches are suboptimal because they are designed to work in traditional projective conditions, while the proposed approach turns out to be the most effective.Comment: CVPR Workshop on Omnidirectional Computer Vision, 202

arXiv.org e-Print Archive

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Author: Jiang Yi
Ma Chuofan
Qi Xiaojuan
Wen Xin
Yuan Zehuan
Publication venue
Publication date: 25/10/2023
Field of study

Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has superior performances and compelling scalability in open-vocabulary detection, e.g., by scaling up the visual backbone, CoDet achieves 37.0

\text{AP}^m_{novel}

and 44.7

\text{AP}^m_{all}

on OV-LVIS, surpassing the previous SoTA by 4.2

\text{AP}^m_{novel}

and 9.8

\text{AP}^m_{all}

. Code is available at https://github.com/CVMI-Lab/CoDet.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive