2,316 research outputs found
A Multiple-Expert Binarization Framework for Multispectral Images
In this work, a multiple-expert binarization framework for multispectral
images is proposed. The framework is based on a constrained subspace selection
limited to the spectral bands combined with state-of-the-art gray-level
binarization methods. The framework uses a binarization wrapper to enhance the
performance of the gray-level binarization. Nonlinear preprocessing of the
individual spectral bands is used to enhance the textual information. An
evolutionary optimizer is considered to obtain the optimal and some suboptimal
3-band subspaces from which an ensemble of experts is then formed. The
framework is applied to a ground truth multispectral dataset with promising
results. In addition, a generalization to the cross-validation approach is
developed that not only evaluates generalizability of the framework, it also
provides a practical instance of the selected experts that could be then
applied to unseen inputs despite the small size of the given ground truth
dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1
Graph-based Data Modeling and Analysis for Data Fusion in Remote Sensing
Hyperspectral imaging provides the capability of increased sensitivity and discrimination over traditional imaging methods by combining standard digital imaging with spectroscopic methods. For each individual pixel in a hyperspectral image (HSI), a continuous spectrum is sampled as the spectral reflectance/radiance signature to facilitate identification of ground cover and surface material. The abundant spectrum knowledge allows all available information from the data to be mined. The superior qualities within hyperspectral imaging allow wide applications such as mineral exploration, agriculture monitoring, and ecological surveillance, etc. The processing of massive high-dimensional HSI datasets is a challenge since many data processing techniques have a computational complexity that grows exponentially with the dimension. Besides, a HSI dataset may contain a limited number of degrees of freedom due to the high correlations between data points and among the spectra. On the other hand, merely taking advantage of the sampled spectrum of individual HSI data point may produce inaccurate results due to the mixed nature of raw HSI data, such as mixed pixels, optical interferences and etc.
Fusion strategies are widely adopted in data processing to achieve better performance, especially in the field of classification and clustering. There are mainly three types of fusion strategies, namely low-level data fusion, intermediate-level feature fusion, and high-level decision fusion. Low-level data fusion combines multi-source data that is expected to be complementary or cooperative. Intermediate-level feature fusion aims at selection and combination of features to remove redundant information. Decision level fusion exploits a set of classifiers to provide more accurate results. The fusion strategies have wide applications including HSI data processing. With the fast development of multiple remote sensing modalities, e.g. Very High Resolution (VHR) optical sensors, LiDAR, etc., fusion of multi-source data can in principal produce more detailed information than each single source. On the other hand, besides the abundant spectral information contained in HSI data, features such as texture and shape may be employed to represent data points from a spatial perspective. Furthermore, feature fusion also includes the strategy of removing redundant and noisy features in the dataset.
One of the major problems in machine learning and pattern recognition is to develop appropriate representations for complex nonlinear data. In HSI processing, a particular data point is usually described as a vector with coordinates corresponding to the intensities measured in the spectral bands. This vector representation permits the application of linear and nonlinear transformations with linear algebra to find an alternative representation of the data. More generally, HSI is multi-dimensional in nature and the vector representation may lose the contextual correlations. Tensor representation provides a more sophisticated modeling technique and a higher-order generalization to linear subspace analysis.
In graph theory, data points can be generalized as nodes with connectivities measured from the proximity of a local neighborhood. The graph-based framework efficiently characterizes the relationships among the data and allows for convenient mathematical manipulation in many applications, such as data clustering, feature extraction, feature selection and data alignment. In this thesis, graph-based approaches applied in the field of multi-source feature and data fusion in remote sensing area are explored. We will mainly investigate the fusion of spatial, spectral and LiDAR information with linear and multilinear algebra under graph-based framework for data clustering and classification problems
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Improving Deep Reinforcement Learning Using Graph Convolution and Visual Domain Transfer
Recent developments in Deep Reinforcement Learning (DRL) have shown tremendous progress in robotics control, Atari games, board games such as Go, etc. However, model free DRL still has limited use cases due to its poor sampling efficiency and generalization on a variety of tasks. In this thesis, two particular drawbacks of DRL are investigated: 1) the poor generalization abilities of model free DRL. More specifically, how to generalize an agent\u27s policy to unseen environments and generalize to task performance on different data representations (e.g. image based or graph based) 2) The reality gap issue in DRL. That is, how to effectively transfer a policy learned in a simulator to the real world. This thesis makes several novel contributions to the field of DRL which are outlined sequentially in the following. Among these contributions is the generalized value iteration network (GVIN) algorithm, which is an end-to-end neural network planning module extending the work of Value Iteration Networks (VIN). GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. Additionally, this thesis proposes three novel, differentiable kernels as graph convolution operators and shows that the embedding-based kernel achieves the best performance. Furthermore, an improvement upon traditional -step -learning that stabilizes training for VIN and GVIN is demonstrated. Additionally, the equivalence between GVIN and graph neural networks is outlined and shown that GVIN can be further extended to address both control and inference problems. The final subject which falls under the graph domain that is studied in this thesis is graph embeddings. Specifically, this work studies a general graph embedding framework GEM-F that unifies most of the previous graph embedding algorithms. Based on the contributions made during the analysis of GEM-F, a novel algorithm called WarpMap which outperforms DeepWalk and node2vec in the unsupervised learning settings is proposed. The aforementioned reality gap in DRL prohibits a significant portion of research from reaching the real world setting. The latter part of this work studies and analyzes domain transfer techniques in an effort to bridge this gap. Typically, domain transfer in RL consists of representation transfer and policy transfer. In this work, the focus is on representation transfer for vision based applications. More specifically, aligning the feature representation from source domain to target domain in an unsupervised fashion. In this approach, a linear mapping function is considered to fuse modules that are trained in different domains. Proposed are two improved adversarial learning methods to enhance the training quality of the mapping function. Finally, the thesis demonstrates the effectiveness of domain alignment among different weather conditions in the CARLA autonomous driving simulator
Two Decades of Colorization and Decolorization for Images and Videos
Colorization is a computer-aided process, which aims to give color to a gray
image or video. It can be used to enhance black-and-white images, including
black-and-white photos, old-fashioned films, and scientific imaging results. On
the contrary, decolorization is to convert a color image or video into a
grayscale one. A grayscale image or video refers to an image or video with only
brightness information without color information. It is the basis of some
downstream image processing applications such as pattern recognition, image
segmentation, and image enhancement. Different from image decolorization, video
decolorization should not only consider the image contrast preservation in each
video frame, but also respect the temporal and spatial consistency between
video frames. Researchers were devoted to develop decolorization methods by
balancing spatial-temporal consistency and algorithm efficiency. With the
prevalance of the digital cameras and mobile phones, image and video
colorization and decolorization have been paid more and more attention by
researchers. This paper gives an overview of the progress of image and video
colorization and decolorization methods in the last two decades.Comment: 12 pages, 19 figure
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
ADVANCED REPRESENTATION LEARNING STRATEGIES FOR BIG DATA ANALYSIS
With the fast technological advancement in data storage and machine learning, big data analytics has become a core component of various practical applications ranging from industrial automation to medical diagnosis and from cyber-security to space exploration. Recent studies show that every day, more than 1.8 billion photos/images are posted on social media, and 720 thousand hours of videos are uploaded to YouTube. Thus, to handle this large amount of visual data efficiently, image/video classification, object detection/recognition, and segmentation tasks have gathered a lot of attention since the decade. Consequently, the researchers in this domain has proposed various feature extraction, feature learning, and feature encoding algorithms for improving the generalization performance of the aforesaid tasks. For example, the generalization performance of the image classification models mainly depends on the choice of data representation. These models aim at building comprehensive representation learning (RL) strategies to encode the relationship among the input and output attributes from the raw big data.
Existing RL strategies can be divided into three general categories: statistic approaches (e.g. probabilistic-based analysis, and correlation-based measures), unsupervised learning (e.g., autoencoders), and supervised learning (e.g., deep convolutional neural network (DCNN)). Among these categories, the unsupervised and supervised learning strategies using artificial neural networks (ANNs) have been widely adopted. In this direction, several auxiliary ideas have been proposed over the past decade, to improve the learning capability of the ANNs. For instance, Moore-Penrose (MP) inverse is exploited to refine the parameters (weights and biases) of a trained network. However, the existing MP inverse-based RL methods have an important limitation. The representations learned through the MP inverse-based strategies suffer from loosely-connected feature coding, resulting into a poor representation of the objects having lack of discriminative power. To address this issue, this dissertation proposes a set of eight novel MP inverse-based RL algorithms.
The first part of this dissertation from Chapter 4 to Chapter 7 is dedicated to proposing novel width-growth models based on subnet neural network (SNN) for representation learning and image classification. In this part, a novel feature learning algorithm, coined Wi-HSNN is proposed, followed by an improved batch-by-batch learning algorithm, called OS-HSNN. Then, two novel SNNs are introduced to detect extreme outliers for one-class classification (OCC). Finally, a semi-supervised SNN, named SS-HSNN is introduced to extend the strategy from the supervised learning domain to the semi-supervised learning domain.
The second part of this thesis, subsuming Chapter 8 and Chapter 9, focuses on improving the performance of the existing multilayer neural networks through harnessing the MP inverse. Here, a novel weight optimization strategy is proposed to improve the performance of multilayer extreme learning machines (ELMs), where the MP inverse is used to feedback the classification imprecision information from the output layer to the hidden layers. Then, a novel fast retraining framework is proposed to enhance the efficiency of transfer learning of DCNNs.
The effectiveness of the proposed subnet- and retraining-based algorithms have been evaluated on several widely used image classification datasets, such as ImageNet and Places-365. Furthermore, we validated the performance of the proposed strategies in some extended domains, such as ship-target detection, food image classification, camera model identification and misinformation identification. The experimental results illustrate the superiority of the proposed algorithms
Deep learning in food category recognition
Integrating artificial intelligence with food category recognition has been a field of interest for research for the
past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The
modern advent of big data and the development of data-oriented fields like deep learning have provided advancements
in food category recognition. With increasing computational power and ever-larger food datasets,
the approach’s potential has yet to be realized. This survey provides an overview of methods that can be applied
to various food category recognition tasks, including detecting type, ingredients, quality, and quantity. We
survey the core components for constructing a machine learning system for food category recognition, including
datasets, data augmentation, hand-crafted feature extraction, and machine learning algorithms. We place a
particular focus on the field of deep learning, including the utilization of convolutional neural networks, transfer
learning, and semi-supervised learning. We provide an overview of relevant studies to promote further developments
in food category recognition for research and industrial applicationsMRC (MC_PC_17171)Royal Society (RP202G0230)BHF (AA/18/3/34220)Hope Foundation for Cancer Research (RM60G0680)GCRF (P202PF11)Sino-UK Industrial
Fund (RP202G0289)LIAS (P202ED10Data Science
Enhancement Fund (P202RE237)Fight for Sight (24NN201);Sino-UK
Education Fund (OP202006)BBSRC (RM32G0178B8
Spatial-Spectral Manifold Embedding of Hyperspectral Data
In recent years, hyperspectral imaging, also known as imaging spectroscopy,
has been paid an increasing interest in geoscience and remote sensing
community. Hyperspectral imagery is characterized by very rich spectral
information, which enables us to recognize the materials of interest lying on
the surface of the Earth more easier. We have to admit, however, that high
spectral dimension inevitably brings some drawbacks, such as expensive data
storage and transmission, information redundancy, etc. Therefore, to reduce the
spectral dimensionality effectively and learn more discriminative spectral
low-dimensional embedding, in this paper we propose a novel hyperspectral
embedding approach by simultaneously considering spatial and spectral
information, called spatial-spectral manifold embedding (SSME). Beyond the
pixel-wise spectral embedding approaches, SSME models the spatial and spectral
information jointly in a patch-based fashion. SSME not only learns the spectral
embedding by using the adjacency matrix obtained by similarity measurement
between spectral signatures, but also models the spatial neighbours of a target
pixel in hyperspectral scene by sharing the same weights (or edges) in the
process of learning embedding. Classification is explored as a potential
strategy to quantitatively evaluate the performance of learned embedding
representations. Classification is explored as a potential application for
quantitatively evaluating the performance of these hyperspectral embedding
algorithms. Extensive experiments conducted on the widely-used hyperspectral
datasets demonstrate the superiority and effectiveness of the proposed SSME as
compared to several state-of-the-art embedding methods
- …