7,630 research outputs found
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Advanced Driver-Assistance System with Traffic Sign Recognition for Safe and Efficient Driving
Advanced Driver-Assistance Systems (ADAS) coupled with traffic sign recognition could lead to safer driving environments. This study presents a sophisticated, yet robust and accurate traffic sign detection system using computer vision and ML, for ADAS. Unavailability of large local traffic sign datasets and the unbalances of traffic sign distribution are the key bottlenecks of this research. Hence, we choose to work with support vector machines (SVM) with a custom-built unbalance dataset, to build a lightweight model with excellent classification accuracy. The SVM model delivered optimum performance with the radial basis kernel, C=10, and gamma=0.0001. In the proposed method, same priority was given to processing time (testing time) and accuracy, as traffic sign identification is time critical. The final accuracy obtained was 87% (with confidence interval 84%-90%) with a processing time of 0.64s (with confidence interval of 0.57s-0.67s) for correct detection at testing, which emphasizes the effectiveness of the proposed method
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
Monocular Camera Viewpoint-Invariant Vehicular Traffic Segmentation and Classification Utilizing Small Datasets
The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to speed up the processing. The VDOT traffic videos are analyzed for vehicle segmentation using an improved robust low-rank matrix decomposition technique. It presents a new and effective thresholding method that improves segmentation accuracy and simultaneously speeds up the segmentation processing. Size and shape physical descriptors from morphological properties and textural features from the Histogram of Oriented Gradients (HOG) are extracted from the segmented traffic. Furthermore, a multi-class support vector machine classifier is employed to categorize different traffic vehicle types, including passenger cars, passenger trucks, motorcycles, buses, and small and large utility trucks. It handles multiple vehicle detections through an iterative k-means clustering over-segmentation process. The proposed algorithm reduced the processed data by an average of 40%. Compared to recent techniques, it showed an average improvement of 15% in segmentation accuracy, and it is 55% faster than the compared segmentation techniques on average. Moreover, a comparative analysis of 23 different deep learning architectures is presented. The resulting algorithm outperformed the compared deep learning algorithms for the quality of vehicle classification accuracy. Furthermore, the timing analysis showed that it could operate in real-time scenarios
Representing 3D shape in sparse range images for urban object classification
This thesis develops techniques for interpreting 3D range images acquired in outdoor environments at a low resolution. It focuses on the task of robustly capturing the shapes that comprise objects, in order to classify them. With the recent development of 3D sensors such as the Velodyne, it is now possible to capture range images at video frame rates, allowing mobile robots to observe dynamic scenes in 3D. To classify objects in these scenes, features are extracted from the data, which allows different regions to be matched. However, range images acquired at this speed are of low resolution, and there are often significant changes in sensor viewpoint and occlusion. In this context, existing methods for feature extraction do not perform well. This thesis contributes algorithms for the robust abstraction from 3D points to object classes. Efficient region-of-interest and surface normal extraction are evaluated, resulting in a keypoint algorithm that provides stable orientations. These build towards a novel feature, called the ‘line image,’ that is designed to consistently capture local shape, regardless of sensor viewpoint. It does this by explicitly reasoning about the difference between known empty space, and space that has not been measured due to occlusion or sparse sensing. A dataset of urban objects scanned with a Velodyne was collected and hand labelled, in order to compare this feature with several others on the task of classification. First, a simple k-nearest neighbours approach was used, where the line image showed improvements. Second, more complex classifiers were applied, requiring the features to be clustered. The clusters were used in topic modelling, allowing specific sub-parts of objects to be learnt across multiple scales, improving accuracy by 10%. This work is applicable to any range image data. In general, it demonstrates the advantages in using the inherent density and occupancy information in a range image during 3D point cloud processing
- …