68 research outputs found

    Theoretical foundations for efficient clustering

    Get PDF
    Clustering aims to group together data instances which are similar while simultaneously separating the dissimilar instances. The task of clustering is challenging due to many factors. The most well-studied is the high computational cost. The clustering task can be viewed as an optimization problem where the goal is to minimize a certain cost function (like k-means cost or k-median cost). Not only are the minimization problems NP-hard but often also NP-hard to approximate (within a constant factor). There are two other major issues in clustering, namely under-specificity and noise-robustness. The focus of this thesis is tackling these two issues while simultaneously ensuring low computational cost. Clustering is an under-specified task. The same dataset may need to be clustered in different ways depending upon the intended application. Different solution requirements need different approaches. In such situations, domain knowledge is needed to better define the clustering problem. We incorporate this by allowing the clustering algorithm to interact with an oracle by asking whether two points belong to the same or different cluster. In a preliminary work, we show that access to a small number of same-cluster queries makes an otherwise NP-hard k-means clustering problem computationally tractable. Next, we consider the problem of clustering for data de-duplication; detecting records which correspond to the same physical entity in a database. We propose a correlation clustering like framework to model such record de-duplication problems. We show that access to a small number of same-cluster queries can help us solve the 'restricted' version of correlation clustering. Rather surprisingly, more relaxed versions of correlation clustering are intractable even when allowed to make a 'large' number of same-cluster queries. Next, we explore the issue of noise-robustness of clustering algorithms. Many real-world datasets, have on top of cohesive subsets, a significant amount of points which are `unstructured'. The addition of these noisy points makes it difficult to detect the structure of the remaining points. In the first line of work, we define noise as not having significantly large dense subsets. We provide computationally efficient clustering algorithms that capture all meaningful clusterings of the dataset; where the clusters are cohesive (defined formally by notions of clusterability) and where the noise satisfies the gray background assumption. We complement our results by showing that when either the notions of structure or the noise requirements are relaxed, no such results are possible. In the second line of work, we develop a generic procedure that can transform objective-based clustering algorithms into one that is robust to outliers (as long the number of such points is not 'too large'). In particular, we develop efficient noise-robust versions of two common clustering algorithms and prove robustness guarantees for them

    Sparse representation for face images.

    Full text link
    This thesis address issues for face recognition with multi-view face images. Several effective methods are proposed and compared with current state of the art. A novel framework that generalises existing sparse representation-based methods in order to exploit the sharing information to against pose variations of face images is proposed

    Class distribution-aware adaptive margins and cluster embedding for classification of fruit and vegetables at supermarket self-checkouts

    Get PDF
    The complex task of vision based fruit and vegetables classification at a supermarket self-checkout poses significant challenges. These challenges include the highly variable physical features of fruit and vegetables i.e. colour, texture shape and size which are dependent upon ripeness and storage conditions in a supermarket as well as general product variation. Supermarket environments are also significantly variable with respect to lighting conditions. Attempting to build an exhaustive dataset to capture all these variations, for example a dataset of a fruit consisting of all possible colour variations, is nearly impossible. Moreover, some fruit and vegetable classes have significant similar physical features e.g. the colour and texture of cabbage and lettuce. Current state-of-the-art classification techniques such as those based on Deep Convolutional Neural Networks (DCNNs) are highly prone to errors resulting from the inter-class similarities and intra-class variations of fruit and vegetable images. The deep features of highly variable classes can invade the features of neighbouring similar classes in a learned feature space of the DCNN, resulting in confused classification hyper-planes. To overcome these limitations of current classification techniques we have proposed a class distribution-aware adaptive margins approach with cluster embedding for classification of fruit and vegetables. We have tested the proposed technique for cluster-based feature embedding and classification effectiveness. It is observed that introduction of adaptive classification margins proportional to the class distribution can achieve significant improvements in clustering and classification effectiveness. The proposed technique is tested for both clustering and classification, and promising results have been obtained

    A review on deep learning techniques for 3D sensed data classification

    Get PDF
    Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape

    Projection Based Models for High Dimensional Data

    Get PDF
    In recent years, many machine learning applications have arisen which deal with the problem of finding patterns in high dimensional data. Principal component analysis (PCA) has become ubiquitous in this setting. PCA performs dimensionality reduction by estimating latent factors which minimise the reconstruction error between the original data and its low-dimensional projection. We initially consider a situation where influential observations exist within the dataset which have a large, adverse affect on the estimated PCA model. We propose a measure of “predictive influence” to detect these points based on the contribution of each point to the leave-one-out reconstruction error of the model using an analytic PRedicted REsidual Sum of Squares (PRESS) statistic. We then develop a robust alternative to PCA to deal with the presence of influential observations and outliers which minimizes the predictive reconstruction error. In some applications there may be unobserved clusters in the data, for which fitting PCA models to subsets of the data would provide a better fit. This is known as the subspace clustering problem. We develop a novel algorithm for subspace clustering which iteratively fits PCA models to subsets of the data and assigns observations to clusters based on their predictive influence on the reconstruction error. We study the convergence of the algorithm and compare its performance to a number of subspace clustering methods on simulated data and in real applications from computer vision involving clustering object trajectories in video sequences and images of faces. We extend our predictive clustering framework to a setting where two high-dimensional views of data have been obtained. Often, only either clustering or predictive modelling is performed between the views. Instead, we aim to recover clusters which are maximally predictive between the views. In this setting two block partial least squares (TB-PLS) is a useful model. TB-PLS performs dimensionality reduction in both views by estimating latent factors that are highly predictive. We fit TB-PLS models to subsets of data and assign points to clusters based on their predictive influence under each model which is evaluated using a PRESS statistic. We compare our method to state of the art algorithms in real applications in webpage and document clustering and find that our approach to predictive clustering yields superior results. Finally, we propose a method for dynamically tracking multivariate data streams based on PLS. Our method learns a linear regression function from multivariate input and output streaming data in an incremental fashion while also performing dimensionality reduction and variable selection. Moreover, the recursive regression model is able to adapt to sudden changes in the data generating mechanism and also identifies the number of latent factors. We apply our method to the enhanced index tracking problem in computational finance

    Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision

    Get PDF
    PhDRecent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods

    Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients

    Full text link
    We present, QP-SBGD, a novel layer-wise stochastic optimiser tailored towards training neural networks with binary weights, known as binary neural networks (BNNs), on quantum hardware. BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy. However, training them in practice remains to be an open challenge. Most known BNN-optimisers either rely on projected updates or binarise weights post-training. Instead, QP-SBGD approximately maps the gradient onto binary variables, by solving a quadratic constrained binary optimisation. Under practically reasonable assumptions, we show that this update rule converges with a rate of O(1/T)\mathcal{O}(1 / \sqrt{T}). Moreover, we show how the NP\mathcal{NP}-hard projection can be effectively executed on an adiabatic quantum annealer, harnessing recent advancements in quantum computation. We also introduce a projected version of this update rule and prove that if a fixed point exists in the binary variable space, the modified updates will converge to it. Last but not least, our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware. Through extensive evaluations, we show that QP-SBGD outperforms or is on par with competitive and well-established baselines such as BinaryConnect, signSGD and ProxQuant when optimising the Rosenbrock function, training BNNs as well as binary graph neural networks

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy
    • …
    corecore