212 research outputs found
Robust Brain MRI Image Classification with SIBOW-SVM
The majority of primary Central Nervous System (CNS) tumors in the brain are
among the most aggressive diseases affecting humans. Early detection of brain
tumor types, whether benign or malignant, glial or non-glial, is critical for
cancer prevention and treatment, ultimately improving human life expectancy.
Magnetic Resonance Imaging (MRI) stands as the most effective technique to
detect brain tumors by generating comprehensive brain images through scans.
However, human examination can be error-prone and inefficient due to the
complexity, size, and location variability of brain tumors. Recently, automated
classification techniques using machine learning (ML) methods, such as
Convolutional Neural Network (CNN), have demonstrated significantly higher
accuracy than manual screening, while maintaining low computational costs.
Nonetheless, deep learning-based image classification methods, including CNN,
face challenges in estimating class probabilities without proper model
calibration. In this paper, we propose a novel brain tumor image classification
method, called SIBOW-SVM, which integrates the Bag-of-Features (BoF) model with
SIFT feature extraction and weighted Support Vector Machines (wSVMs). This new
approach effectively captures hidden image features, enabling the
differentiation of various tumor types and accurate label predictions.
Additionally, the SIBOW-SVM is able to estimate the probabilities of images
belonging to each class, thereby providing high-confidence classification
decisions. We have also developed scalable and parallelable algorithms to
facilitate the practical implementation of SIBOW-SVM for massive images. As a
benchmark, we apply the SIBOW-SVM to a public data set of brain tumor MRI
images containing four classes: glioma, meningioma, pituitary, and normal. Our
results show that the new method outperforms state-of-the-art methods,
including CNN
Learning to compress and search visual data in large-scale systems
The problem of high-dimensional and large-scale representation of visual data
is addressed from an unsupervised learning perspective. The emphasis is put on
discrete representations, where the description length can be measured in bits
and hence the model capacity can be controlled. The algorithmic infrastructure
is developed based on the synthesis and analysis prior models whose
rate-distortion properties, as well as capacity vs. sample complexity
trade-offs are carefully optimized. These models are then extended to
multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is
further evolved as a powerful deep neural network architecture with fast and
sample-efficient training and discrete representations. For the developed
algorithms, three important applications are developed. First, the problem of
large-scale similarity search in retrieval systems is addressed, where a
double-stage solution is proposed leading to faster query times and shorter
database storage. Second, the problem of learned image compression is targeted,
where the proposed models can capture more redundancies from the training
images than the conventional compression codecs. Finally, the proposed
algorithms are used to solve ill-posed inverse problems. In particular, the
problems of image denoising and compressive sensing are addressed with
promising results.Comment: PhD thesis dissertatio
Self-organising maps : statistical analysis, treatment and applications.
This thesis presents some substantial theoretical analyses and optimal treatments
of Kohonen's self-organising map (SOM) algorithm, and explores the practical
application potential of the algorithm for vector quantisation, pattern classification,
and image processing. It consists of two major parts. In the first part, the SOM
algorithm is investigated and analysed from a statistical viewpoint. The proof of its
universal convergence for any dimensionality is obtained using a novel and
extended form of the Central Limit Theorem. Its feature space is shown to be an
approximate multivariate Gaussian process, which will eventually converge and
form a mapping, which minimises the mean-square distortion between the feature
and input spaces. The diminishing effect of the initial states and implicit effects of
the learning rate and neighbourhood function on its convergence and ordering are
analysed and discussed. Distinct and meaningful definitions, and associated
measures, of its ordering are presented in relation to map's fault-tolerance. The
SOM algorithm is further enhanced by incorporating a proposed constraint, or
Bayesian modification, in order to achieve optimal vector quantisation or pattern
classification. The second part of this thesis addresses the task of unsupervised
texture-image segmentation by means of SOM networks and model-based
descriptions. A brief review of texture analysis in terms of definitions, perceptions,
and approaches is given. Markov random field model-based approaches are
discussed in detail. Arising from this a hierarchical self-organised segmentation
structure, which consists of a local MRF parameter estimator, a SOM network, and
a simple voting layer, is proposed and is shown, by theoretical analysis and
practical experiment, to achieve a maximum likelihood or maximum a posteriori
segmentation. A fast, simple, but efficient boundary relaxation algorithm is
proposed as a post-processor to further refine the resulting segmentation. The class
number validation problem in a fully unsupervised segmentation is approached by
a classical, simple, and on-line minimum mean-square-error method. Experimental
results indicate that this method is very efficient for texture segmentation
problems. The thesis concludes with some suggestions for further work on SOM
neural networks
Attention Mechanism for Recognition in Computer Vision
It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27
A Survey on Generative Diffusion Model
Deep learning shows excellent potential in generation tasks thanks to deep
latent representation. Generative models are classes of models that can
generate observations randomly concerning certain implied parameters. Recently,
the diffusion Model has become a rising class of generative models by its
power-generating ability. Nowadays, great achievements have been reached. More
applications except for computer vision, speech generation, bioinformatics, and
natural language processing are to be explored in this field. However, the
diffusion model has its genuine drawback of a slow generation process, single
data types, low likelihood, and the inability for dimension reduction. They are
leading to many enhanced works. This survey makes a summary of the field of the
diffusion model. We first state the main problem with two landmark works --
DDPM and DSM, and a unified landmark work -- Score SDE. Then, we present
improved techniques for existing problems in the diffusion-based model field,
including speed-up improvement For model speed-up improvement, data structure
diversification, likelihood optimization, and dimension reduction. Regarding
existing models, we also provide a benchmark of FID score, IS, and NLL
according to specific NFE. Moreover, applications with diffusion models are
introduced including computer vision, sequence modeling, audio, and AI for
science. Finally, there is a summarization of this field together with
limitations \& further directions. The summation of existing well-classified
methods is in our
Github:https://github.com/chq1155/A-Survey-on-Generative-Diffusion-Model
- …