741 research outputs found
Reduced Memory Region Based Deep Convolutional Neural Network Detection
Accurate pedestrian detection has a primary role in automotive safety: for
example, by issuing warnings to the driver or acting actively on car's brakes,
it helps decreasing the probability of injuries and human fatalities. In order
to achieve very high accuracy, recent pedestrian detectors have been based on
Convolutional Neural Networks (CNN). Unfortunately, such approaches require
vast amounts of computational power and memory, preventing efficient
implementations on embedded systems. This work proposes a CNN-based detector,
adapting a general-purpose convolutional network to the task at hand. By
thoroughly analyzing and optimizing each step of the detection pipeline, we
develop an architecture that outperforms methods based on traditional image
features and achieves an accuracy close to the state-of-the-art while having
low computational complexity. Furthermore, the model is compressed in order to
fit the tight constrains of low power devices with a limited amount of embedded
memory available. This paper makes two main contributions: (1) it proves that a
region based deep neural network can be finely tuned to achieve adequate
accuracy for pedestrian detection (2) it achieves a very low memory usage
without reducing detection accuracy on the Caltech Pedestrian dataset.Comment: IEEE 2016 ICCE-Berli
Web-Based Visualization of Very Large Scientific Astronomy Imagery
Visualizing and navigating through large astronomy images from a remote
location with current astronomy display tools can be a frustrating experience
in terms of speed and ergonomics, especially on mobile devices. In this paper,
we present a high performance, versatile and robust client-server system for
remote visualization and analysis of extremely large scientific images.
Applications of this work include survey image quality control, interactive
data query and exploration, citizen science, as well as public outreach. The
proposed software is entirely open source and is designed to be generic and
applicable to a variety of datasets. It provides access to floating point data
at terabyte scales, with the ability to precisely adjust image settings in
real-time. The proposed clients are light-weight, platform-independent web
applications built on standard HTML5 web technologies and compatible with both
touch and mouse-based devices. We put the system to the test and assess the
performance of the system and show that a single server can comfortably handle
more than a hundred simultaneous users accessing full precision 32 bit
astronomy data.Comment: Published in Astronomy & Computing. IIPImage server available from
http://iipimage.sourceforge.net . Visiomatic code and demos available from
http://www.visiomatic.org
Approximate Correspondences in High Dimensions
Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. We also show how the matching itself (a correspondence field) may be extracted for a small increase in computational cost. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate thatour approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel
State-of-the-Art and Trends in Scalable Video Compression with Wavelet Based Approaches
3noScalable Video Coding (SVC) differs form traditional single point approaches mainly because it allows to encode in a unique bit stream several working points corresponding to different quality, picture size and frame rate. This work describes the current state-of-the-art in SVC, focusing on wavelet based motion-compensated approaches (WSVC). It reviews individual components that have been designed to address the problem over the years and how such components are typically combined to achieve meaningful WSVC architectures. Coding schemes which mainly differ from the space-time order in which the wavelet transforms operate are here compared, discussing strengths and weaknesses of the resulting implementations. An evaluation of the achievable coding performances is provided considering the reference architectures studied and developed by ISO/MPEG in its exploration on WSVC. The paper also attempts to draw a list of major differences between wavelet based solutions and the SVC standard jointly targeted by ITU and ISO/MPEG. A major emphasis is devoted to a promising WSVC solution, named STP-tool, which presents architectural similarities with respect to the SVC standard. The paper ends drawing some evolution trends for WSVC systems and giving insights on video coding applications which could benefit by a wavelet based approach.partially_openpartially_openADAMI N; SIGNORONI. A; R. LEONARDIAdami, Nicola; Signoroni, Alberto; Leonardi, Riccard
Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification
Designing discriminative powerful texture features robust to realistic
imaging conditions is a challenging computer vision problem with many
applications, including material recognition and analysis of satellite or
aerial imagery. In the past, most texture description approaches were based on
dense orderless statistical distribution of local features. However, most
recent approaches to texture recognition and remote sensing scene
classification are based on Convolutional Neural Networks (CNNs). The d facto
practice when learning these CNN models is to use RGB patches as input with
training performed on large amounts of labeled data (ImageNet). In this paper,
we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
using mapped coded images with explicit texture information provide
complementary information to the standard RGB deep models. Additionally, two
deep architectures, namely early and late fusion, are investigated to combine
the texture and color information. To the best of our knowledge, we are the
first to investigate Binary Patterns encoded CNNs and different deep network
fusion architectures for texture recognition and remote sensing scene
classification. We perform comprehensive experiments on four texture
recognition datasets and four remote sensing scene classification benchmarks:
UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
7 categories and the recently introduced large scale aerial image dataset (AID)
with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
information to standard RGB deep model of the same network architecture. Our
late fusion TEX-Net architecture always improves the overall performance
compared to the standard RGB network on both recognition problems. Our final
combination outperforms the state-of-the-art without employing fine-tuning or
ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …