Search CORE

12 research outputs found

Improving Image Classification with Location Context

Author: Bourdev Lubomir
Fei-Fei Li
Fergus Rob
Paluri Manohar
Tang Kevin
Publication venue
Publication date: 14/05/2015
Field of study

With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them. In addition to research that tries to predict GPS coordinates from visual features, this also opens up the door to problems that are conditioned on the availability of GPS coordinates. In this work, we tackle the problem of performing image classification with location context, in which we are given the GPS coordinates for images in both the train and test phases. We explore different ways of encoding and extracting features from the GPS coordinates, and show how to naturally incorporate these features into a Convolutional Neural Network (CNN), the current state-of-the-art for most image classification and recognition problems. We also show how it is possible to simultaneously learn the optimal pooling radii for a subset of our features within the CNN framework. To evaluate our model and to help promote research in this area, we identify a set of location-sensitive concepts and annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has GPS coordinates with these concepts, which we make publicly available. By leveraging location context, we are able to achieve almost a 7% gain in mean average precision

arXiv.org e-Print Archive

Crossref

Find your Way by Observing the Sun and Other Semantic Cues

Author: Brubaker Marcus A.
Fidler Sanja
Ma Wei-Chiu
Urtasun Raquel
Wang Shenlong
Publication venue
Publication date: 23/06/2016
Field of study

In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world. Towards this goal, we utilize freely available cartographic maps and derive a probabilistic model that exploits semantic cues in the form of sun direction, presence of an intersection, road type, speed limit as well as the ego-car trajectory in order to produce very reliable localization results. Our experimental evaluation shows that our approach can localize much faster (in terms of driving time) with less computation and more robustly than competing approaches, which ignore semantic information

arXiv.org e-Print Archive

Crossref

Disparate View Matching

Author: Bansal Mayank
Publication venue: ScholarlyCommons
Publication date: 01/01/2015
Field of study

Matching of disparate views has gained significance in computer vision due to its role in many novel application areas. Being able to match images of the same scene captured during day and night, between a historic and contemporary picture of a scene, and between aerial and ground-level views of a building facade all enable novel applications ranging from loop-closure detection for structure-from-motion and re-photography to geo-localization of a street-level image using reference imagery captured from the air. The goal of this work is to develop novel features and methods that address matching problems where direct appearance-based correspondences are either difficult to obtain or infeasible because of the lack of appearance similarity altogether. To address these problems, we propose methods that span the appearance-geometry spectrum in terms of both the use of these cues as well as the ability of each method to handle variations in appearance and geometry. First, we consider the problem of geo-localization of a query street-level image using a reference database of building facades captured from a bird\u27s eye view. To address this wide-baseline facade matching problem, a novel scale-selective self-similarity feature that avoids direct comparison of appearance between disparate facade images is presented. Next, to address image matching problems with more extreme appearance variation, a novel representation for matchable images expressed in terms of the eigen-functions of the joint graph of the two images is presented. This representation is used to derive features that are persistent across wide variations in appearance. Next, the problem setting of matching between a street-level image and a digital elevation map (DEM) is considered. Given the limited appearance information available in this scenario, the matching approach has to rely more significantly on geometric cues. Therefore, a purely geometric method to establish correspondences between building corners in the DEM and the visible corners in the query image is presented. Finally, to generalize this problem setting we address the problem of establishing correspondences between 3D and 2D point clouds using geometric means alone. A novel framework for incorporating purely geometric constraints into a higher-order graph matching framework is presented with specific formulations for the three-point calibrated absolute camera pose problem (P3P), two-point upright camera pose problem (Up2p) and the three-plus-one relative camera pose problem

ScholarlyCommons@Penn

Improving Efficiency of DNN-based Relocalization Module for Autonomous Driving with Server-side Computing

Author: Cheng Jieren
Li Dengbo
Liu Boyi
Publication venue
Publication date: 30/11/2023
Field of study

In this work, we present a novel framework for camera relocation in autonomous vehicles, leveraging deep neural networks (DNN). While existing literature offers various DNN-based camera relocation methods, their deployment is hindered by their high computational demands during inference. In contrast, our approach addresses this challenge through edge cloud collaboration. Specifically, we strategically offload certain modules of the neural network to the server and evaluate the inference time of data frames under different network segmentation schemes to guide our offloading decisions. Our findings highlight the vital role of server-side offloading in DNN-based camera relocation for autonomous vehicles, and we also discuss the results of data fusion. Finally, we validate the effectiveness of our proposed framework through experimental evaluation

arXiv.org e-Print Archive

Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization

Author: Bai Min
Bârsan Ioan Andrei
Homayounfar Namdar
Lakshmikanth Shrinidhi Kowshika
Ma Wei-Chiu
Mattyus Gellert
Pokrovsky Andrei
Tartavull Ignacio
Urtasun Raquel
Wang Shenlong
Publication venue
Publication date: 08/08/2019
Field of study

In this paper we propose a novel semantic localization algorithm that exploits multiple sensors and has precision on the order of a few centimeters. Our approach does not require detailed knowledge about the appearance of the world, and our maps require orders of magnitude less storage than maps utilized by traditional geometry- and LiDAR intensity-based localizers. This is important as self-driving cars need to operate in large environments. Towards this goal, we formulate the problem in a Bayesian filtering framework, and exploit lanes, traffic signs, as well as vehicle dynamics to localize robustly with respect to a sparse semantic map. We validate the effectiveness of our method on a new highway dataset consisting of 312km of roads. Our experiments show that the proposed approach is able to achieve 0.05m lateral accuracy and 1.12m longitudinal accuracy on average while taking up only 0.3% of the storage required by previous LiDAR intensity-based approaches.Comment: 8 pages, 4 figures, 4 tables, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019

arXiv.org e-Print Archive

Crossref

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

Author: Arandjelović Relja
Chen Hui
Chiu Han-Pang
Cordts Marius
Cummins Mark J
Faghri Fartash
Gong Yunchao
Hu Sixing
Huang Feiran
Hubert Tsai Yao-Hung
Levinson Jesse
Mahmood Faisal
Mao Junhua
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Pronobis Andrzej
Razavian Sharif
Rottmann Axel
Schönberger Johannes L
Seymour Zachary
Toft Carl
Wang Tan
Wu Jianixn
Wu Yiling
Zadeh Amir
Zhou Bolei
Zolanvari SM
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2020
Field of study

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.Comment: ACM Multimedia 202

arXiv.org e-Print Archive

Crossref