5,290 research outputs found
PlaNet - Photo Geolocation with Convolutional Neural Networks
Is it possible to build a system to determine the location where a photo was
taken using just its pixels? In general, the problem seems exceptionally
difficult: it is trivial to construct situations where no location can be
inferred. Yet images often contain informative cues such as landmarks, weather
patterns, vegetation, road markings, and architectural details, which in
combination may allow one to determine an approximate location and occasionally
an exact location. Websites such as GeoGuessr and View from your Window suggest
that humans are relatively good at integrating these cues to geolocate images,
especially en-masse. In computer vision, the photo geolocation problem is
usually approached using image retrieval methods. In contrast, we pose the
problem as one of classification by subdividing the surface of the earth into
thousands of multi-scale geographic cells, and train a deep network using
millions of geotagged images. While previous approaches only recognize
landmarks or perform approximate matching using global image descriptors, our
model is able to use and integrate multiple visible cues. We show that the
resulting model, called PlaNet, outperforms previous approaches and even
attains superhuman levels of accuracy in some cases. Moreover, we extend our
model to photo albums by combining it with a long short-term memory (LSTM)
architecture. By learning to exploit temporal coherence to geolocate uncertain
photos, we demonstrate that this model achieves a 50% performance improvement
over the single-image model
Automating image analysis by annotating landmarks with deep neural networks
Image and video analysis is often a crucial step in the study of animal behavior and kinematics. Often these analyses require that the position of one or more animal landmarks are annotated (marked) in numerous images. The process of annotating landmarks can require a significant amount of time and tedious labor, which motivates the need for algorithms that can automatically annotate landmarks. In the community of scientists that use image and video analysis to study the 3D flight of animals, there has been a trend of developing more automated approaches for annotating landmarks, yet they fall short of being generally applicable. Inspired by the success of Deep Neural Networks (DNNs) on many problems in the field of computer vision, we investigate how suitable DNNs are for accurate and automatic annotation of landmarks in video datasets representative of those collected by scientists studying animals.
Our work shows, through extensive experimentation on videos of hawkmoths, that DNNs are suitable for automatic and accurate landmark localization. In particular, we show that one of our proposed DNNs is more accurate than the current best algorithm for automatic localization of landmarks on hawkmoth videos. Moreover, we demonstrate how these annotations can be used to quantitatively analyze the 3D flight of a hawkmoth. To facilitate the use of DNNs by scientists from many different fields, we provide a self contained explanation of what DNNs are, how they work, and how to apply them to other datasets using the freely available library Caffe and supplemental code that we provide.https://arxiv.org/abs/1702.00583Published versio
Computationally efficient cardiac views projection using 3D Convolutional Neural Networks
4D Flow is an MRI sequence which allows acquisition of 3D images of the
heart. The data is typically acquired volumetrically, so it must be reformatted
to generate cardiac long axis and short axis views for diagnostic
interpretation. These views may be generated by placing 6 landmarks: the left
and right ventricle apex, and the aortic, mitral, pulmonary, and tricuspid
valves. In this paper, we propose an automatic method to localize landmarks in
order to compute the cardiac views. Our approach consists of first calculating
a bounding box that tightly crops the heart, followed by a landmark
localization step within this bounded region. Both steps are based on a 3D
extension of the recently introduced ENet. We demonstrate that the long and
short axis projections computed with our automated method are of equivalent
quality to projections created with landmarks placed by an experienced cardiac
radiologist, based on a blinded test administered to a different cardiac
radiologist
Experiences on a motivational learning approach for robotics in undergraduate courses
This paper presents an educational experience carried out in robotics undergraduate courses from two
different degrees: Computer Science and Industrial Engineering, having students with diverse
capabilities and motivations. The experience compares two learning strategies for the practical
lessons of such courses: one relies on code snippets in Matlab to cope with typical robotic problems
like robot motion, localization, and mapping, while the second strategy opts for using the ROS
framework for the development of algorithms facing a competitive challenge, e.g. exploration
algorithms. The obtained students’ opinions were instructive, reporting, for example, that although they
consider harder to master ROS when compared to Matlab, it might be more useful in their (robotic
related) professional careers, which enhanced their disposition to study it. They also considered that
the challenge-exercises, in addition to motivate them, helped to develop their skills as engineers to a
greater extent than the skeleton-code based ones. These and other conclusions will be useful in
posterior courses to boost the interest and motivation of the students.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
- …