25 research outputs found

    Fast and accurate image and video analysis on Riemannian manifolds

    Get PDF

    Learning to hash for large scale image retrieval

    Get PDF
    This thesis is concerned with improving the effectiveness of nearest neighbour search. Nearest neighbour search is the problem of finding the most similar data-points to a query in a database, and is a fundamental operation that has found wide applicability in many fields. In this thesis the focus is placed on hashing-based approximate nearest neighbour search methods that generate similar binary hashcodes for similar data-points. These hashcodes can be used as the indices into the buckets of hashtables for fast search. This work explores how the quality of search can be improved by learning task specific binary hashcodes. The generation of a binary hashcode comprises two main steps carried out sequentially: projection of the image feature vector onto the normal vectors of a set of hyperplanes partitioning the input feature space followed by a quantisation operation that uses a single threshold to binarise the resulting projections to obtain the hashcodes. The degree to which these operations preserve the relative distances between the datapoints in the input feature space has a direct influence on the effectiveness of using the resulting hashcodes for nearest neighbour search. In this thesis I argue that the retrieval effectiveness of existing hashing-based nearest neighbour search methods can be increased by learning the thresholds and hyperplanes based on the distribution of the input data. The first contribution is a model for learning multiple quantisation thresholds. I demonstrate that the best threshold positioning is projection specific and introduce a novel clustering algorithm for threshold optimisation. The second contribution extends this algorithm by learning the optimal allocation of quantisation thresholds per hyperplane. In doing so I argue that some hyperplanes are naturally more effective than others at capturing the distribution of the data and should therefore attract a greater allocation of quantisation thresholds. The third contribution focuses on the complementary problem of learning the hashing hyperplanes. I introduce a multi-step iterative model that, in the first step, regularises the hashcodes over a data-point adjacency graph, which encourages similar data-points to be assigned similar hashcodes. In the second step, binary classifiers are learnt to separate opposing bits with maximum margin. This algorithm is extended to learn hyperplanes that can generate similar hashcodes for similar data-points in two different feature spaces (e.g. text and images). Individually the performance of these algorithms is often superior to competitive baselines. I unify my contributions by demonstrating that learning hyperplanes and thresholds as part of the same model can yield an additive increase in retrieval effectiveness

    Vision-based retargeting for endoscopic navigation

    Get PDF
    Endoscopy is a standard procedure for visualising the human gastrointestinal tract. With the advances in biophotonics, imaging techniques such as narrow band imaging, confocal laser endomicroscopy, and optical coherence tomography can be combined with normal endoscopy for assisting the early diagnosis of diseases, such as cancer. In the past decade, optical biopsy has emerged to be an effective tool for tissue analysis, allowing in vivo and in situ assessment of pathological sites with real-time feature-enhanced microscopic images. However, the non-invasive nature of optical biopsy leads to an intra-examination retargeting problem, which is associated with the difficulty of re-localising a biopsied site consistently throughout the whole examination. In addition to intra-examination retargeting, retargeting of a pathological site is even more challenging across examinations, due to tissue deformation and changing tissue morphologies and appearances. The purpose of this thesis is to address both the intra- and inter-examination retargeting problems associated with optical biopsy. We propose a novel vision-based framework for intra-examination retargeting. The proposed framework is based on combining visual tracking and detection with online learning of the appearance of the biopsied site. Furthermore, a novel cascaded detection approach based on random forests and structured support vector machines is developed to achieve efficient retargeting. To cater for reliable inter-examination retargeting, the solution provided in this thesis is achieved by solving an image retrieval problem, for which an online scene association approach is proposed to summarise an endoscopic video collected in the first examination into distinctive scenes. A hashing-based approach is then used to learn the intrinsic representations of these scenes, such that retargeting can be achieved in subsequent examinations by retrieving the relevant images using the learnt representations. For performance evaluation of the proposed frameworks, extensive phantom, ex vivo and in vivo experiments have been conducted, with results demonstrating the robustness and potential clinical values of the methods proposed.Open Acces

    Scalable and Compact 3D Action Recognition with Approximated RBF Kernel Machines

    Get PDF
    Despite the recent deep learning (DL) revolution, kernel machines still remain powerful methods for action recognition. DL has brought the use of large datasets and this is typically a problem for kernel approaches, which are not scaling up eciently due to kernel Gram matrices. Nevertheless, kernel methods are still attractive and more generally applicable since they can equally manage dierent sizes of the datasets, also in cases where DL techniques show some limitations. This work investigates these issues by proposing an explicit ap- proximated representation that, together with a linear model, is an equivalent, yet scalable, implementation of a kernel machine. Our approximation is directly inspired by the exact feature map that is induced by an RBF Gaussian kernel but, unlike the latter, it is nite dimensional and very compact. We justify the soundness of our idea with a theoretical analysis which proves the unbiasedness of the approximation, and provides a vanishing bound for its variance, which is shown to decrease much rapidly than in alternative methods in the literature. In a broad experimental validation, we assess the superiority of our approximation in terms of 1) ease and speed of training, 2) compactness of the model, and 3) improvements with respect to the state-of-the-art performance

    A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends

    Get PDF
    Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, imagerelated tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distortions. Evolutionary computation (EC) approaches have been widely used for image analysis with significant achievement. However, there is no comprehensive survey of existing EC approaches to image analysis. To fill this gap, this paper provides a comprehensive survey covering all essential EC approaches to important image analysis tasks including edge detection, image segmentation, image feature analysis, image classification, object detection, and others. This survey aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches and exploring how and why EC is used for CV and image analysis. The applications, challenges, issues, and trends associated to this research field are also discussed and summarised to provide further guidelines and opportunities for future research

    Deep Multi-View Learning for Visual Understanding

    Get PDF
    PhD ThesisMulti-view data is the result of an entity being perceived or represented from multiple perspectives. Plenty of applications in visual understanding contain multi-view data. For example, the face images for training a recognition system are usually captured by different devices from multiple angles. This thesis focuses on the cross-view visual recognition problems, e.g., identifying the face images of the same person across different cameras. Several representative multi-view settings, from the supervised multi-view learning to the more challenging unsupervised domain adaptive (UDA) multi-view learning, are investigated. Novel multi-view learning algorithms are proposed correspondingly. To be more specific, the proposed methods are based on the advanced deep neural network (DNN) architectures for better handling visual data. However, directly combining the multi-view learning objectives with DNN can result in different issues, e.g., on scalability, and limit the application scenarios and model performance. Corresponding novelties in DNN methods are thus required to solve them. This thesis is organised into three parts. Each chapter focuses on a multi-view learning setting with novel solutions and is detailed as follows: Chapter 3 A supervised multi-view learning setting with two different views are studied. To recognise the data samples across views, one strategy is aligning them in a common feature space via correlation maximisation. It is also known as canonical correlation analysis (CCA). Deep CCA has been proposed for better performance with the non-linear projection via deep neural networks. Existing deep CCA models typically decorrelate the deep feature dimensions of each view before their Euclidean distances are minimised in the common space. This feature decorrelation is achieved by enforcing an exact decorrelation constraint which is computationally expensive due to the matrix inversion or SVD operations. Therefore, existing deep CCA models are inefficient and have scalability issues. Furthermore, the exact decorrelation is incompatible with the gradient based deep model training and results in sub-optimal solution. To overcome these aforementioned issues, a novel deep CCA model Soft CCA is introduced in this thesis. Specifically, the exact decorrelation is replaced by soft decorrelation via a mini-batch based Stochastic Decorrelation Loss (SDL). It can be jointly optimised with the other training objectives. In addition, our SDL loss can be applied to other deep models beyond multi-view learning. Chapter 4 The supervised multi-view learning setting, whereby more than two views exist, are studied in this chapter. Recently developed deep multi-view learning algorithms either learn a latent visual representation based on a single semantic level and/or require laborious human annotation of these factors as attributes. A novel deep neural network architecture, called Multi- Level Factorisation Net (MLFN), is proposed to automatically factorise the visual appearance into latent discriminative factors at multiple semantic levels without manual annotation. The main purpose is forcing different views share the same latent factors so that they are can be aligned at all layers. Specifically, MLFN is composed of multiple stacked blocks. Each block contains multiple factor modules to model latent factors at a specific level, and factor selection modules that dynamically select the factor modules to interpret the content of each input image. The outputs of the factor selection modules also provide a compact latent factor descriptor that is complementary to the conventional deeply learned feature, and they can be fused efficiently. The effectiveness of the proposed MLFN is demonstrated by not only the large-scale cross-view recognition problems but also the general object categorisation tasks. Chapter 5 The last problem is a special unsupervised domain adaptation setting called unsupervised domain adaptive (UDA) multi-view learning. It contains a fully annotated dataset as the source domain and another unsupervised dataset with relevant tasks as the target domain. The main purpose is to improve the performance of the unlabelled dataset with the annotated data from the other dataset. More importantly, this setting further requires both the source and target domains are multi-view datasets with relevant tasks. Therefore, the assumption of the aligned label space across domains is inappropriate in the UDA multi-view learning. For example, the person re-identification (Re-ID) datasets built on different surveillance scenarios are with images of different people captured and should be given disjoint person identity labels. Existing methods for UDA multi-view learning problems are aligning different domains either in the raw image space or a feature embedding space for domain alignment. In this thesis, a different framework, multi-task learning, is adopted with the domain specific objectives for a common space learning. Specifically, such common space is proposed to enable the knowledge transfer. The conventional supervised losses can be used for the labelled source data while the unsupervised objectives for the target domain play the key roles in domain adaptation. Two novel unsupervised objectives are introduced for UDA multi-view learning and result in two models as below. The first model, termed common factorised space model (CFSM), is built on the assumptions that the semantic latent attributes are shared between the source and target domains since they are relevant multi-view learning tasks. Different from the existing methods that based on domain alignment, CFSM emphasizes on transferring the information across domains via discovering discriminative latent factors in the proposed common space. However, the multi-view data from target domain is without labels. Therefore, an unsupervised factorisation loss is derived and applied on the common space for latent factors discovery across domains. The second model still learns a shared embedding space with multi-view data from both domains but with a different assumption. It attempts to discover the latent correspondence of multi-view data in the unsupervised target data. The target data’s contribution comes from a clustering process. Each cluster thus reveals the underlying cross-view correspondences across multiple views in target domain. To this end, a novel Stochastic Inference for Deep Clustering (SIDC) method is proposed. It reduces self-reinforcing errors that lead to premature convergence to a sub-optimal solution by changing the conventional deterministic cluster assignment to a stochastic one

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF
    corecore