10 research outputs found
Geospatial Computer Vision Based on Multi-Modal DataāHow Valuable Is Shape Information for the Extraction of Semantic Information?
In this paper, we investigate the value of different modalities and their combination for the analysis of geospatial data of low spatial resolution. For this purpose, we present a framework that allows for the enrichment of geospatial data with additional semantics based on given color information, hyperspectral information, and shape information. While the different types of information are used to define a variety of features, classification based on these features is performed using a random forest classifier. To draw conclusions about the relevance of different modalities and their combination for scene analysis, we present and discuss results which have been achieved with our framework on the MUUFL Gulfport Hyperspectral and LiDAR Airborne Data Set
Multi-Scale Hierarchical Conditional Random Field for Railway Electrification Scene Classification Using Mobile Laser Scanning Data
With the recent rapid development of high-speed railway in many countries, precise inspection for railway electrification systems has become more significant to ensure safe railway operation. However, this time-consuming manual inspection is not satisfactory for the high-demanding inspection task, thus a safe, fast and automatic inspection method is required. With LiDAR (Light Detection and Ranging) data becoming more available, the accurate railway electrification scene understanding using LiDAR data becomes feasible towards automatic 3D precise inspection.
This thesis presents a supervised learning method to classify railway electrification objects from Mobile Laser Scanning (MLS) data. First, a multi-range Conditional Random Field (CRF), which characterizes not only labeling homogeneity at a short range, but also the layout compatibility between different objects at a middle range in the probabilistic graphical model is implemented and tested. Then, this multi-range CRF model will be extended and improved into a hierarchical CRF model to consider multi-scale layout compatibility at full range. The proposed method is evaluated on a dataset collected in Korea with complex railway electrification systems environment. The experiment shows the effectiveness of proposed model
Non-associative higher-order markov networks for point cloud classification
In this paper, we introduce a non-associative higher-order graphical model to tackle the problem of semantic labeling of 3D point clouds. For this task, existing higher-order models overlook the relationships between the different classes and simply encourage the nodes in the cliques to have consistent labelings. We address this issue by devising a set of non-associative context patterns that describe higher-order geometric relationships between different class labels within the cliques. To this end, we propose a method to extract informative cliques in 3D point clouds that provide more knowledge about the context of the scene. We evaluate our approach on three challenging outdoor point cloud datasets. Our experiments evidence the benefits of our non-associative higher-order Markov networks over state-of-the-art point cloud labeling techniques
Line Based Multi-Range Asymmetric Conditional Random Field For Terrestrial Laser Scanning Data Classification
Terrestrial Laser Scanning (TLS) is a ground-based, active imaging method that rapidly acquires accurate, highly dense three-dimensional point cloud of object surfaces by laser range finding. For fully utilizing its benefits, developing a robust method to classify many objects of interests from huge amounts of laser point clouds is urgently required. However, classifying massive TLS data faces many challenges, such as complex urban scene, partial data acquisition from occlusion. To make an automatic, accurate and robust TLS data classification, we present a line-based multi-range asymmetric Conditional Random Field algorithm.
The first contribution is to propose a line-base TLS data classification method. In this thesis, we are interested in seven classes: building, roof, pedestrian road (PR), tree, low man-made object (LMO), vehicle road (VR), and low vegetation (LV). The line-based classification is implemented in each scan profile, which follows the line profiling nature of laser scanning mechanism.Ten conventional local classifiers are tested, including popular generative and discriminative classifiers, and experimental results validate that the line-based method can achieve satisfying classification performance. However, local classifiers implement labeling task on individual line independently of its neighborhood, the inference of which often suffers from similar local appearance across different object classes. The second contribution is to propose a multi-range asymmetric Conditional Random Field (maCRF) model, which uses object context as post-classification to improve the performance of a local generative classifier. The maCRF incorporates appearance, local smoothness constraint, and global scene layout regularity together into a probabilistic graphical model. The local smoothness enforces that lines in a local area to have the same class label, while scene layout favours an asymmetric regularity of spatial arrangement between different object classes within long-range, which is considered both in vertical (above-bellow relation) and horizontal (front-behind) directions. The asymmetric regularity allows capturing directional spatial arrangement between pairwise objects (e.g. it allows ground is lower than building, not vice-versa). The third contribution is to extend the maCRF model by adding across scan profile context, which is called Across scan profile Multi-range Asymmetric Conditional Random Field (amaCRF) model. Due to the sweeping nature of laser scanning, the sequentially acquired TLS data has strong spatial dependency, and the across scan profile context can provide more contextual information. The final contribution is to propose a sequential classification strategy. Along the sweeping direction of laser scanning, amaCRF models were sequentially constructed. By dynamically updating posterior probability of common scan profiles, contextual information propagates through adjacent scan profiles
Scene Parsing using Multiple Modalities
Scene parsing is the task of assigning a semantic class
label to the elements of a scene. It has many applications in
autonomous systems when we need to understand the visual data
captured from our environment. Different sensing modalities, such
as RGB cameras, multi-spectral cameras and Lidar sensors, can be
beneļ¬cial when pursuing this goal. Scene analysis using
multiple modalities aims at leveraging complementary information
captured by multiple sensing modalities. When multiple modalities
are used together, the strength of each modality can combat the
weaknesses of other modalities. Therefore, working with multiple
modalities enables us to use powerful tools for scene analysis.
However, possible gains of using multiple modalities come with
new challenges such as dealing with misalignments between
different modalities. In this thesis, our aim is to take
advantage of multiple modalities to improve outdoor scene parsing
and address the associated challenges. We initially investigate
the potential of multi-spectral imaging for outdoor scene
analysis. Our approach is to combine the discriminative strength
of the multi-spectral signature in each pixel and the
corresponding nature of the surrounding texture. Many materials
appearing similar if viewed by a common RGB camera, will show
discriminating properties if viewed by a camera capturing a
greater number of separated wavelengths. When using imagery data
for scene parsing, a number of challenges stem from, e.g., color
saturation, shadow and occlusion. To address such challenges, we
focus on scene parsing using multiple modalities, panoramic RGB
images and 3D Lidar data in particular, and propose a multi-view
approach to select the best 2D view that describes each element
in the 3D point cloud data. Keeping our focus on using multiple
modalities, we then introduce a multi-modal graphical model to
address the problems of scene parsing using 2D3D data exhibiting
extensive many-to-one correspondences. Existing methods often
impose a hard correspondence between the 2D and 3D data, where
the 2D and 3D corresponding regions are forced to receive
identical labels. This results in performance degradation due to
misalignments, 3D-2D projection errors and occlusions. We address
this issue by deļ¬ning a graph over the entire set of data that
models soft correspondences between the two modalities. This
graph encourages each region in a modality to leverage the
information from its corresponding regions in the other modality
to better estimate its class label. Finally, we introduce latent
nodes to explicitly model inconsistencies between the modalities.
The latent nodes allow us not only to leverage information from
various domains in order to improve the labeling of the
modalities, but also to cut the edges between inconsistent
regions. To eliminate the need for hand tuning the parameters of
our model, we propose to learn potential functions from training
data. In addition, to demonstrate the beneļ¬ts of the proposed
approaches on publicly available multi-modality datasets, we
introduce a new multi-modal dataset of panoramic images and 3D
point cloud data captured from outdoor scenes (NICTA/2D3D
Dataset)