30 research outputs found
Cataloging Public Objects Using Aerial and Street-Level Images – Urban Trees
Each corner of the inhabited world is imaged from multiple viewpoints with increasing frequency. Online map services like Google Maps or Here Maps provide direct access to huge amounts of densely sampled, georeferenced images from street view and aerial perspective. There is an opportunity to design computer vision systems that will help us search, catalog and monitor public infrastructure, buildings and artifacts. We explore the architecture and feasibility of such a system. The main technical challenge is combining test time information from multiple views of each geographic location (e.g., aerial and street views). We implement two modules: det2geo, which detects the set of locations of objects belonging to a given category, and geo2cat, which computes the fine-grained category of the object at a given location. We introduce a solution that adapts state-of-the-art CNN-based object detectors and classifiers. We test our method on “Pasadena Urban Trees”, a new dataset of 80,000 trees with geographic and species annotations, and show that combining multiple views significantly improves both tree detection and tree species classification, rivaling human performance
The iNaturalist Species Classification and Detection Dataset
Existing image classification datasets used in computer vision tend to have a
uniform distribution of images across object categories. In contrast, the
natural world is heavily imbalanced, as some species are more abundant and
easier to photograph than others. To encourage further progress in challenging
real world conditions we present the iNaturalist species classification and
detection dataset, consisting of 859,000 images from over 5,000 different
species of plants and animals. It features visually similar species, captured
in a wide variety of situations, from all over the world. Images were collected
with different camera types, have varying image quality, feature a large class
imbalance, and have been verified by multiple citizen scientists. We discuss
the collection of the dataset and present extensive baseline experiments using
state-of-the-art computer vision classification and detection models. Results
show that current non-ensemble based methods achieve only 67% top one
classification accuracy, illustrating the difficulty of the dataset.
Specifically, we observe poor results for classes with small numbers of
training examples suggesting more attention is needed in low-shot learning.Comment: CVPR 201
Treepedia 2.0: Applying Deep Learning for Large-scale Quantification of Urban Tree Cover
Recent advances in deep learning have made it possible to quantify urban
metrics at fine resolution, and over large extents using street-level images.
Here, we focus on measuring urban tree cover using Google Street View (GSV)
images. First, we provide a small-scale labelled validation dataset and propose
standard metrics to compare the performance of automated estimations of street
tree cover using GSV. We apply state-of-the-art deep learning models, and
compare their performance to a previously established benchmark of an
unsupervised method. Our training procedure for deep learning models is novel;
we utilize the abundance of openly available and similarly labelled
street-level image datasets to pre-train our model. We then perform additional
training on a small training dataset consisting of GSV images. We find that
deep learning models significantly outperform the unsupervised benchmark
method. Our semantic segmentation model increased mean intersection-over-union
(IoU) from 44.10% to 60.42% relative to the unsupervised method and our
end-to-end model decreased Mean Absolute Error from 10.04% to 4.67%. We also
employ a recently developed method called gradient-weighted class activation
map (Grad-CAM) to interpret the features learned by the end-to-end model. This
technique confirms that the end-to-end model has accurately learned to identify
tree cover area as key features for predicting percentage tree cover. Our paper
provides an example of applying advanced deep learning techniques on a
large-scale, geo-tagged and image-based dataset to efficiently estimate
important urban metrics. The results demonstrate that deep learning models are
highly accurate, can be interpretable, and can also be efficient in terms of
data-labelling effort and computational resources.Comment: Accepted and will appear in IEEE BigData Congress 2018 Conference
Proceeding
The Devil is in the Tails: Fine-grained Classification in the Wild
The world is long-tailed. What does this mean for computer vision and visual
recognition? The main two implications are (1) the number of categories we need
to consider in applications can be very large, and (2) the number of training
examples for most categories can be very small. Current visual recognition
algorithms have achieved excellent classification accuracy. However, they
require many training examples to reach peak performance, which suggests that
long-tailed distributions will not be dealt with well. We analyze this question
in the context of eBird, a large fine-grained classification dataset, and a
state-of-the-art deep network classification algorithm. We find that (a) peak
classification performance on well-represented categories is excellent, (b)
given enough data, classification performance suffers only minimally from an
increase in the number of classes, (c) classification performance decays
precipitously as the number of training examples decreases, (d) surprisingly,
transfer learning is virtually absent in current methods. Our findings suggest
that our community should come to grips with the question of long tails
QUANTIFICAÇÃO E GEORREFERENCIAMENTO SEMIAUTOMÁTICO DE ÁRVORES URBANAS
O inventário de árvores urbanas é importante para o conhecimento das espécies existentes e a geolocalização contribui para uma gestão eficiente. Diversas técnicas para a realização de tal inventário são apresentadas na literatura e a produtividade e desempenho são os mais diversos possíveis. Muitos trabalhos na literatura nacional realizam uma amostragem para fazer inferência sobre a população das espécies em um município, para se reduzir o tempo de levantamento e consequentemente os custos. Nesse trabalho, é apresentada uma metodologia inovadora para a produção de dados para a realização do inventário, que consiste em uma unidade de mapeamento móvel, um conjunto de câmaras e sensores GNSS para o georreferenciamento semiautomático dos indivíduos presentes nas ruas e avenidas da mancha urbana da cidade de Monte Carmelo – MG. Todos os indivíduos presentes nas vias foram georreferenciados, totalizando 7337 árvores e em um tempo de produção (aproximadamente 100 horas) consideravelmente baixo, quando comparado com outras técnicas de levantamento. Em trabalhos futuros, pretende-se cadastrar as espécies, altura, condições de saúde e disponibilizar os dados em um sigweb