Search CORE

822 research outputs found

PlaNet - Photo Geolocation with Convolutional Neural Networks

Author: A Babenko
A Graves
A Mikulík
AR Zamir
AR Zamir
G Baatz
H Jegou
H Jégou
J Duchi
J Elman
J Hays
J Knopp
MD Zeiler
S Cao
S Hochreiter
T Sattler
Y Li
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Large-Scale Mapping of Human Activity using Geo-Tagged Videos

Author: Liu Sen
Newsam Shawn
Zhu Yi
Publication venue
Publication date: 28/11/2017
Field of study

This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.Comment: Accepted at ACM SIGSPATIAL 201

arXiv.org e-Print Archive

Crossref

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Author: A Owens
F Zhang
G Cheng
GS Xia
K Nogueira
Lvd Maaten
Q Zou
T Baltrušaitis
WL Zheng
Publication venue
Publication date: 15/07/2020
Field of study

Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Leveraging Overhead Imagery for Localization, Mapping, and Understanding

Author: Workman Scott
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Ground-level and overhead images provide complementary viewpoints of the world. This thesis proposes methods which leverage dense overhead imagery, in addition to sparsely distributed ground-level imagery, to advance traditional computer vision problems, such as ground-level image localization and fine-grained urban mapping. Our work focuses on three primary research areas: learning a joint feature representation between ground-level and overhead imagery to enable direct comparison for the task of image geolocalization, incorporating unlabeled overhead images by inferring labels from nearby ground-level images to improve image-driven mapping, and fusing ground-level imagery with overhead imagery to enhance understanding. The ultimate contribution of this thesis is a general framework for estimating geospatial functions, such as land cover or land use, which integrates visual evidence from both ground-level and overhead image viewpoints

University of Kentucky

Describing and Understanding Neighborhood Characteristics through Online Social Media

Author: Cramer Henriette
Kafsi Mohamed
Shamma David A.
Thomee Bart
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/03/2015
Field of study

Geotagged data can be used to describe regions in the world and discover local themes. However, not all data produced within a region is necessarily specifically descriptive of that area. To surface the content that is characteristic for a region, we present the geographical hierarchy model (GHM), a probabilistic model based on the assumption that data observed in a region is a random mixture of content that pertains to different levels of a hierarchy. We apply the GHM to a dataset of 8 million Flickr photos in order to discriminate between content (i.e., tags) that specifically characterizes a region (e.g., neighborhood) and content that characterizes surrounding areas or more general themes. Knowledge of the discriminative and non-discriminative terms used throughout the hierarchy enables us to quantify the uniqueness of a given region and to compare similar but distant regions. Our evaluation demonstrates that our model improves upon traditional Naive Bayes classification by 47% and hierarchical TF-IDF by 27%. We further highlight the differences and commonalities with human reasoning about what is locally characteristic for a neighborhood, distilled from ten interviews and a survey that covered themes such as time, events, and prior regional knowledgeComment: Accepted in WWW 2015, 2015, Florence, Ital

arXiv.org e-Print Archive

CiteSeerX

Find the one you like! Profiling Swiss parks with user generated content

Author: Komossa Franziska
Marino Daniela
Michel Annina Helena
Purves Ross S
Publication venue: Elsevier
Publication date: 01/08/2023
Field of study

The establishment of national parks originated from the desire to preserve scenic landscape areas of national or regional importance. With more recent diversification of protected area types and goals, obtaining knowledge on how parks are recreationally used has become more challenging for (local) policy makers and park managements, as there is a general lack of systematic and publicly available visitor monitoring data. We analyze recreational park use for 20 Swiss parks of national importance and develop park profiles, using user-generated content from Flickr. The 20 Swiss parks are described by 111,437 unique images taken by 6,468 unique users between 2007 and 2020. We fill an existing research gap by defining park use across three dimensions space, time and users, and combining these in our analyses. The park profiles provide information on diversity of recreational use and serve as a starting point for analyzing how the three dimensions contribute to this diversity. Our results show diverging park uses for the three dimensions indicating that park location matters, especially in terms of peri-urbanity and geographic region. Our method can be translated into European scale analyses, provided that different languages are considered. Park profiles are easy to communicate and easy to interpret tools for (local) policy-makers and park managers to segment the tourism market and develop new park marketing strategies to e.g. streamline visitation flows and reduce the negative impacts of outdoor recreation. In broader terms, our study serves as input for future recreation policy to protect, restore and promote sustainable use of protected areas Management implications • We offer insights into recreational park use in terms of users, space and time, particularly valuable for parks where monitoring visitation is vital (i.e. protected areas). • Park profiles provide easy to communicate and interpret extensible tools for (local) policy-makers and park managers. • Park profiles can be used for among others the development of new park marketing strategies catering preferences for differentiated user groups. • Since different recreational user groups differ in their recreational behavior and in turn their environmental impact, park profiles can help in streamlining visitors as a potentially effective management measure for increased park sustainability (e.g. biodiversity conservation)

ZORA