68 research outputs found
Theoretical foundations for efficient clustering
Clustering aims to group together data instances which are similar while simultaneously separating the dissimilar instances. The task of clustering is challenging due to many factors. The most well-studied is the high computational cost. The clustering task can be viewed as an optimization problem where the goal is to minimize a certain cost function (like k-means cost or k-median cost). Not only are the minimization problems NP-hard but often also NP-hard to approximate (within a constant factor). There are two other major issues in clustering, namely under-specificity and noise-robustness. The focus of this thesis is tackling these two issues while simultaneously ensuring low computational cost.
Clustering is an under-specified task. The same dataset may need to be clustered in different ways depending upon the intended application. Different solution requirements need different approaches. In such situations, domain knowledge is needed to better define the clustering problem. We incorporate this by allowing the clustering algorithm to interact with an oracle by asking whether two points belong to the same or different cluster. In a preliminary work, we show that access to a small number of same-cluster queries makes an otherwise NP-hard k-means clustering problem computationally tractable. Next, we consider the problem of clustering for data de-duplication; detecting records which correspond to the same physical entity in a database. We propose a correlation clustering like framework to model such record de-duplication problems. We show that access to a small number of same-cluster queries can help us solve the 'restricted' version of correlation clustering. Rather surprisingly, more relaxed versions of correlation clustering are intractable even when allowed to make a 'large' number of same-cluster queries.
Next, we explore the issue of noise-robustness of clustering algorithms. Many real-world datasets, have on top of cohesive subsets, a significant amount of points which are `unstructured'. The addition of these noisy points makes it difficult to detect the structure of the remaining points. In the first line of work, we define noise as not having significantly large dense subsets. We provide computationally efficient clustering algorithms that capture all meaningful clusterings of the dataset; where the clusters are cohesive (defined formally by notions of clusterability) and where the noise satisfies the gray background assumption. We complement our results by showing that when either the notions of structure or the noise requirements are relaxed, no such results are possible. In the second line of work, we develop a generic procedure that can transform objective-based clustering algorithms into one that is robust to outliers (as long the number of such points is not 'too large'). In particular, we develop efficient noise-robust versions of two common clustering algorithms and prove robustness guarantees for them
Sparse representation for face images.
This thesis address issues for face recognition with multi-view face images. Several effective methods are proposed and compared with current state of the art. A novel framework that generalises existing sparse representation-based methods in order to exploit the sharing information to against pose variations of face images is proposed
Class distribution-aware adaptive margins and cluster embedding for classification of fruit and vegetables at supermarket self-checkouts
The complex task of vision based fruit and vegetables classification at a supermarket self-checkout poses significant challenges. These challenges include the highly variable physical features of fruit and vegetables i.e. colour, texture shape and size which are dependent upon ripeness and storage conditions in a supermarket as well as general product variation. Supermarket environments are also significantly variable with respect to lighting conditions. Attempting to build an exhaustive dataset to capture all these variations, for example a dataset of a fruit consisting of all possible colour variations, is nearly impossible. Moreover, some fruit and vegetable classes have significant similar physical features e.g. the colour and texture of cabbage and lettuce. Current state-of-the-art classification techniques such as those based on Deep Convolutional Neural Networks (DCNNs) are highly prone to errors resulting from the inter-class similarities and intra-class variations of fruit and vegetable images. The deep features of highly variable classes can invade the features of neighbouring similar classes in a learned feature space of the DCNN, resulting in confused classification hyper-planes. To overcome these limitations of current classification techniques we have proposed a class distribution-aware adaptive margins approach with cluster embedding for classification of fruit and vegetables. We have tested the proposed technique for cluster-based feature embedding and classification effectiveness. It is observed that introduction of adaptive classification margins proportional to the class distribution can achieve significant improvements in clustering and classification effectiveness. The proposed technique is tested for both clustering and classification, and promising results have been obtained
A review on deep learning techniques for 3D sensed data classification
Over the past decade deep learning has driven progress in 2D image
understanding. Despite these advancements, techniques for automatic 3D sensed
data understanding, such as point clouds, is comparatively immature. However,
with a range of important applications from indoor robotics navigation to
national scale remote sensing there is a high demand for algorithms that can
learn to automatically understand and classify 3D sensed data. In this paper we
review the current state-of-the-art deep learning architectures for processing
unstructured Euclidean data. We begin by addressing the background concepts and
traditional methodologies. We review the current main approaches including;
RGB-D, multi-view, volumetric and fully end-to-end architecture designs.
Datasets for each category are documented and explained. Finally, we give a
detailed discussion about the future of deep learning for 3D sensed data, using
literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape
Projection Based Models for High Dimensional Data
In recent years, many machine learning applications have arisen which deal with the
problem of finding patterns in high dimensional data. Principal component analysis
(PCA) has become ubiquitous in this setting. PCA performs dimensionality reduction
by estimating latent factors which minimise the reconstruction error between
the original data and its low-dimensional projection. We initially consider a situation
where influential observations exist within the dataset which have a large,
adverse affect on the estimated PCA model. We propose a measure of “predictive
influence” to detect these points based on the contribution of each point to the
leave-one-out reconstruction error of the model using an analytic PRedicted REsidual
Sum of Squares (PRESS) statistic. We then develop a robust alternative to PCA
to deal with the presence of influential observations and outliers which minimizes
the predictive reconstruction error.
In some applications there may be unobserved clusters in the data, for which
fitting PCA models to subsets of the data would provide a better fit. This is known
as the subspace clustering problem. We develop a novel algorithm for subspace
clustering which iteratively fits PCA models to subsets of the data and assigns observations
to clusters based on their predictive influence on the reconstruction error.
We study the convergence of the algorithm and compare its performance to a number
of subspace clustering methods on simulated data and in real applications from
computer vision involving clustering object trajectories in video sequences and images
of faces.
We extend our predictive clustering framework to a setting where two high-dimensional
views of data have been obtained. Often, only either clustering or predictive modelling is performed between the views. Instead, we aim to recover
clusters which are maximally predictive between the views. In this setting two block
partial least squares (TB-PLS) is a useful model. TB-PLS performs dimensionality
reduction in both views by estimating latent factors that are highly predictive. We
fit TB-PLS models to subsets of data and assign points to clusters based on their
predictive influence under each model which is evaluated using a PRESS statistic.
We compare our method to state of the art algorithms in real applications in webpage
and document clustering and find that our approach to predictive clustering
yields superior results.
Finally, we propose a method for dynamically tracking multivariate data streams
based on PLS. Our method learns a linear regression function from multivariate
input and output streaming data in an incremental fashion while also performing
dimensionality reduction and variable selection. Moreover, the recursive regression
model is able to adapt to sudden changes in the data generating mechanism and also
identifies the number of latent factors. We apply our method to the enhanced index
tracking problem in computational finance
Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision
PhDRecent advances in robust subspace estimation have made dimensionality reduction and
noise and outlier suppression an area of interest for research, along with continuous
improvements in computer vision applications. Due to the nature of image and video
signals that need a high dimensional representation, often storage, processing, transmission,
and analysis of such signals is a difficult task. It is therefore desirable to obtain a
low-dimensional representation for such signals, and at the same time correct for corruptions,
errors, and outliers, so that the signals could be readily used for later processing.
Major recent advances in low-rank modelling in this context were initiated by the work of
Cand`es et al. [17] where the authors provided a solution for the long-standing problem of
decomposing a matrix into low-rank and sparse components in a Robust Principal Component
Analysis (RPCA) framework. However, for computer vision applications RPCA
is often too complex, and/or may not yield desirable results. The low-rank component
obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks
lower dimensional representations are required. The RPCA has the ability to robustly
estimate noise and outliers and separate them from the low-rank component, by a sparse
part. But, it has no mechanism of providing an insight into the structure of the sparse
solution, nor a way to further decompose the sparse part into a random noise and a structured
sparse component that would be advantageous in many computer vision tasks. As
videos signals are usually captured by a camera that is moving, obtaining a low-rank
component by RPCA becomes impossible. In this thesis, novel Approximated RPCA
algorithms are presented, targeting different shortcomings of the RPCA. The Approximated
RPCA was analysed to identify the most time consuming RPCA solutions, and
replace them with simpler yet tractable alternative solutions. The proposed method is
able to obtain the exact desired rank for the low-rank component while estimating a
global transformation to describe camera-induced motion. Furthermore, it is able to
decompose the sparse part into a foreground sparse component, and a random noise
part that contains no useful information for computer vision processing. The foreground
sparse component is obtained by several novel structured sparsity-inducing norms, that
better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for
reducing complexity of low-rank estimation have been proposed that achieve significant
complexity reduction without sacrificing the visual representation of video and image
information. The proposed algorithms are applied to several fundamental computer
vision tasks, namely, high efficiency video coding, batch image alignment, inpainting,
and recovery, video stabilisation, background modelling and foreground segmentation,
robust subspace clustering and motion estimation, face recognition, and ultra high definition
image and video super-resolution. The algorithms proposed in this thesis including
batch image alignment and recovery, background modelling and foreground segmentation,
robust subspace clustering and motion segmentation, and ultra high definition
image and video super-resolution achieve either state-of-the-art or comparable results to
existing methods
Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
We present, QP-SBGD, a novel layer-wise stochastic optimiser tailored towards
training neural networks with binary weights, known as binary neural networks
(BNNs), on quantum hardware. BNNs reduce the computational requirements and
energy consumption of deep learning models with minimal loss in accuracy.
However, training them in practice remains to be an open challenge. Most known
BNN-optimisers either rely on projected updates or binarise weights
post-training. Instead, QP-SBGD approximately maps the gradient onto binary
variables, by solving a quadratic constrained binary optimisation. Under
practically reasonable assumptions, we show that this update rule converges
with a rate of . Moreover, we show how the
-hard projection can be effectively executed on an adiabatic
quantum annealer, harnessing recent advancements in quantum computation. We
also introduce a projected version of this update rule and prove that if a
fixed point exists in the binary variable space, the modified updates will
converge to it. Last but not least, our algorithm is implemented layer-wise,
making it suitable to train larger networks on resource-limited quantum
hardware. Through extensive evaluations, we show that QP-SBGD outperforms or is
on par with competitive and well-established baselines such as BinaryConnect,
signSGD and ProxQuant when optimising the Rosenbrock function, training BNNs as
well as binary graph neural networks
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
- …