115 research outputs found

    Invariant encoding schemes for visual recognition

    Get PDF
    Many encoding schemes, such as the Scale Invariant Feature Transform (SIFT) and Histograms of Oriented Gradients (HOG), make use of templates of histograms to enable a loose encoding of the spatial position of basic features such as oriented gradients. Whilst such schemes have been successfully applied, the use of a template may limit the potential as it forces the histograms to conform to a rigid spatial arrangement. In this work we look at developing novel schemes making use of histograms, without the need for a template, which offer good levels of performance in visual recognition tasks. To do this, we look at the way the basic feature type changes across scale at individual locations. This gives rise to the notion of column features, which capture this change across scale by concatenating feature types at a given scale separation. As well as applying this idea to oriented gradients, we make wide use of Basic Image Features (BIFs) and oriented Basic Image Features (oBIFs) which encode local symmetry information. This resulted in a range of encoding schemes. We then tested these schemes on problems of current interest in three application areas. First, the recognition of characters taken from natural images, where our system outperformed existing methods. For the second area we selected a texture problem, involving the discrimination of quartz grains using surface texture, where the system achieved near perfect performance on the first task, and a level of performance comparable to an expert human on the second. In the third area, writer identification, the system achieved a perfect score and outperformed other methods when tested using the Arabic handwriting dataset as part of the ICDAR 2011 Competition

    Learning and Example Selection for Object and Pattern Detection

    Get PDF
    This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems

    Ordinal Shape Coding and Correlation for Orientation-invariant 2D Shape Matching

    Get PDF
    The human brain and visual system is highly robust and efficient at recognising objects. Although biologically inspired approaches within the field of Computer Vision are often considered as state of the art, a complete understanding of how the brain and visual system works has not yet been unlocked. Benefits of such an understanding are twofold with respect to Computer Vision: firstly, a more robust object recognition system could be produced and secondly a computer architecture as efficient as the brain and visual system would significantly reduce power requirements. Therefore it is worthy to pursue and evaluate biologically inspired theories of object recognition. This engineering doctorate thesis provides an implementation and evaluation of a biologically inspired theory of object recognition called Ordinal Shape Coding and Correlation (OSCC). The theory is underpinned by relative coding and correlation within the human brain and visual system. A derivation of the theory is illustrated with respect to an implementation alongside proposed extensions. As a result, a hierarchical sequence alignment method is proposed for the correlation of multi- dimensional ordinal shape descriptors for the context of orientation-invariant 2D shape descriptor matching. Orientation-invariant 2D shape descriptor matching evaluations are presented which cover both synthetic data and the public MNIST handwritten digits dataset. Synthetic data evaluations show that the proposed OSCC method can be used as a discriminative orientation-invariant 2D shape descriptor. Furthermore, it is shown that the close competitor Shape Context (SC) method outperforms the OSCC method when applied to the MNIST handwritten digits dataset. However, it is shown that OSCC outperforms the SC method when appearance and bending energy costs are removed from the SC method to compare pure shape descriptors. Future work proposes that bending energy and appearance costs are integrated into the OSCC pipeline for further OCR evaluations

    Pattern detection and recognition using over-complete and sparse representations

    Get PDF
    Recent research in harmonic analysis and mammalian vision systems has revealed that over-complete and sparse representations play an important role in visual information processing. The research on applying such representations to pattern recognition and detection problems has become an interesting field of study. The main contribution of this thesis is to propose two feature extraction strategies - the global strategy and the local strategy - to make use of these representations. In the global strategy, over-complete and sparse transformations are applied to the input pattern as a whole and features are extracted in the transformed domain. This strategy has been applied to the problems of rotation invariant texture classification and script identification, using the Ridgelet transform. Experimental results have shown that better performance has been achieved when compared with Gabor multi-channel filtering method and Wavelet based methods. The local strategy is divided into two stages. The first one is to analyze the local over-complete and sparse structure, where the input 2-D patterns are divided into patches and the local over-complete and sparse structure is learned from these patches using sparse approximation techniques. The second stage concerns the application of the local over-complete and sparse structure. For an object detection problem, we propose a sparsity testing technique, where a local over-complete and sparse structure is built to give sparse representations to the text patterns and non-sparse representations to other patterns. Object detection is achieved by identifying patterns that can be sparsely represented by the learned. structure. This technique has been applied. to detect texts in scene images with a recall rate of 75.23% (about 6% improvement compared with other works) and a precision rate of 67.64% (about 12% improvement). For applications like character or shape recognition, the learned over-complete and sparse structure is combined. with a Convolutional Neural Network (CNN). A second text detection method is proposed based on such a combination to further improve (about 11% higher compared with our first method based on sparsity testing) the accuracy of text detection in scene images. Finally, this method has been applied to handwritten Farsi numeral recognition, which has obtained a 99.22% recognition rate on the CENPARMI Database and a 99.5% recognition rate on the HODA Database. Meanwhile, a SVM with gradient features achieves recognition rates of 98.98% and 99.22% on these databases respectivel

    Vision-based retargeting for endoscopic navigation

    Get PDF
    Endoscopy is a standard procedure for visualising the human gastrointestinal tract. With the advances in biophotonics, imaging techniques such as narrow band imaging, confocal laser endomicroscopy, and optical coherence tomography can be combined with normal endoscopy for assisting the early diagnosis of diseases, such as cancer. In the past decade, optical biopsy has emerged to be an effective tool for tissue analysis, allowing in vivo and in situ assessment of pathological sites with real-time feature-enhanced microscopic images. However, the non-invasive nature of optical biopsy leads to an intra-examination retargeting problem, which is associated with the difficulty of re-localising a biopsied site consistently throughout the whole examination. In addition to intra-examination retargeting, retargeting of a pathological site is even more challenging across examinations, due to tissue deformation and changing tissue morphologies and appearances. The purpose of this thesis is to address both the intra- and inter-examination retargeting problems associated with optical biopsy. We propose a novel vision-based framework for intra-examination retargeting. The proposed framework is based on combining visual tracking and detection with online learning of the appearance of the biopsied site. Furthermore, a novel cascaded detection approach based on random forests and structured support vector machines is developed to achieve efficient retargeting. To cater for reliable inter-examination retargeting, the solution provided in this thesis is achieved by solving an image retrieval problem, for which an online scene association approach is proposed to summarise an endoscopic video collected in the first examination into distinctive scenes. A hashing-based approach is then used to learn the intrinsic representations of these scenes, such that retargeting can be achieved in subsequent examinations by retrieving the relevant images using the learnt representations. For performance evaluation of the proposed frameworks, extensive phantom, ex vivo and in vivo experiments have been conducted, with results demonstrating the robustness and potential clinical values of the methods proposed.Open Acces

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    3D compositional hierarchies for object categorization

    Get PDF
    Deep learning methods have become the default tool for image classification. However, application of deep learning to surface shape classification is burdened by the limitations of existing methods, in particular, by lack of invariance to geometric transformations of input data. This thesis proposes two novel frameworks for learning a multi-layer representation of surface shape features, namely the view-based and the surface-based compositional hierarchical frameworks. The proposed representation is a hierarchical vocabulary of shape features, termed parts. Parts of the first layer are pre-defined, while parts of the subsequent layers, describing spatial relations of subparts, are learned. The view-based framework describes spatial relations between subparts using a camera-based reference frame. The key stage of the learning algorithm is part selection which forms the vocabulary based on multi-objective optimization, considering different importance measures of parts. Our experiments show that this framework enables efficient category recognition on a large-scale dataset. The surface-based framework exploits part-based intrinsic reference frames, which are computed for lower layers parts and inherited by parts of the subsequent layers. During learning spatial relations between subparts are described in these reference frames. During inference, a part is detected in input data when its subparts are detected at certain positions and orientations in each other’s reference frames. Since rigid body transformations don’t change positions and orientations of parts in intrinsic reference frames, this approach enables efficient recognition from unseen poses. Experiments show that this framework exhibits a large discriminative power and greater robustness to rigid body transformations than advanced CNN-based methods
    • …
    corecore