15 research outputs found

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

    2-D shapes description by using features based on the differential turning angle scalogram

    Get PDF
    International audienceA 2-D shape description using the turning angle is presented 1 . This descriptor is based on a scalogram obtained from a progressive filtering of a planar closed contour. At a given scale, the differential turning angle function is calculated from which, three essential points are derived: the minimum differential-turning angle (α-points), the maximum differential-turning angle (β-points) and the zero-crossing of the turning angle (γ-points). For a continuum of the scale values in the filtering process, a map (called d-TASS map) is generated. As shown experimentally in a previous study, this map is invariant under rotation, translation and scale change. Moreover, it is shearing and noise resistant. The contribution of the present study is firstly, to prove theoretically that d-TASS is rotation and scale change invariant and secondly to propose a new descriptor extracted from the blocks within the scalogram. When applied to shape retrieval from commonly used image databases like MPEG-7 Core Experiments Shape-1 dataset, Multiview Curve Dataset and marines animals of SQUID dataset, experimental results yield very encouraging efficiency and effectiveness of the new analysis approach and the proposed descriptor

    A modified shape context method for shape based object retrieval

    Full text link

    Multi-Object Shape Retrieval Using Curvature Trees

    Get PDF
    This work presents a geometry-based image retrieval approach for multi-object images. We commence with developing an effective shape matching method for closed boundaries. Then, a structured representation, called curvature tree (CT), is introduced to extend the shape matching approach to handle images containing multiple objects with possible holes. We also propose an algorithm, based on Gestalt principles, to detect and extract high-level boundaries (or envelopes), which may evolve as a result of the spatial arrangement of a group of image objects. At first, a shape retrieval method using triangle-area representation (TAR) is presented for non-rigid shapes with closed boundaries. This representation is effective in capturing both local and global characteristics of a shape, invariant to translation, rotation, scaling and shear, and robust against noise and moderate amounts of occlusion. For matching, two algorithms are introduced. The first algorithm matches concavity maxima points extracted from TAR image obtained by thresholding the TAR. In the second matching algorithm, dynamic space warping (DSW) is employed to search efficiently for the optimal (least cost) correspondence between the points of two shapes. Experimental results using the MPEG-7 CE-1 database of 1400 shapes show the superiority of our method over other recent methods. Then, a geometry-based image retrieval system is developed for multi-object images. We model both shape and topology of image objects including holes using a structured representation called curvature tree (CT). To facilitate shape-based matching, the TAR of each object and hole is stored at the corresponding node in the CT. The similarity between two CTs is measured based on the maximum similarity subtree isomorphism (MSSI) where a one-to-one correspondence is established between the nodes of the two trees. Our matching scheme agrees with many recent findings in psychology about the human perception of multi-object images. Two algorithms are introduced to solve the MSSI problem: an approximate and an exact. Both algorithms have polynomial-time computational complexity and use the DSW as the similarity measure between the attributed nodes. Experiments on a database of 13500 medical images and a database of 1580 logo images have shown the effectiveness of the proposed method. The purpose of the last part is to allow for high-level shape retrieval in multi-object images by detecting and extracting the envelope of high-level object groupings in the image. Motivated by studies in Gestalt theory, a new algorithm for the envelope extraction is proposed that works in two stages. The first stage detects the envelope (if exists) and groups its objects using hierarchical clustering. In the second stage, each grouping is merged using morphological operations and then further refined using concavity tree reconstruction to eliminate odd concavities in the extracted envelope. Experiment on a set of 110 logo images demonstrates the feasibility of our approach

    Shape recognition through multi-level fusion of features and classifiers

    Get PDF
    Shape recognition is a fundamental problem and a special type of image classification, where each shape is considered as a class. Current approaches to shape recognition mainly focus on designing low-level shape descriptors, and classify them using some machine learning approaches. In order to achieve effective learning of shape features, it is essential to ensure that a comprehensive set of high quality features can be extracted from the original shape data. Thus we have been motivated to develop methods of fusion of features and classifiers for advancing the classification performance. In this paper, we propose a multi-level framework for fusion of features and classifiers in the setting of gran-ular computing. The proposed framework involves creation of diversity among classifiers, through adopting feature selection and fusion to create diverse feature sets and to train diverse classifiers using different learn-Xinming Wang algorithms. The experimental results show that the proposed multi-level framework can effectively create diversity among classifiers leading to considerable advances in the classification performance

    DISCRIMINATIVE LEARNING AND RECOGNITION USING DICTIONARIES

    Get PDF
    In recent years, the theory of sparse representation has emerged as a powerful tool for efficient processing of data in non-traditional ways. This is mainly due to the fact that most signals and images of interest tend to be sparse or compressible in some dictionary. In other words, they can be well approximated by a linear combination of a few elements (also known as atoms) of a dictionary. This dictionary can either be an analytic dictionary composed of wavelets or Fourier basis or it can be directly trained from data. It has been observed that dictionaries learned directly from data provide better representation and hence can improve the performance of many practical applications such as restoration and classification. In this dissertation, we study dictionary learning and recognition under supervised, unsupervised, and semi-supervised settings. In the supervised case, we propose an approach to recognize humans in unconstrained videos, where the main challenge is exploiting the identity information in multiple frames and the accompanying dynamic signature. These identity cues include face, body, and motion. Our approach is based on video-dictionaries for face and body. We design video-dictionaries to implicitly encode temporal, pose, and illumination information. Next, we propose a novel multivariate sparse representation method that jointly represents all the video data by a sparse linear combination of training data. To increase the ability of our algorithm to learn nonlinearities, we apply kernel methods to learn the dictionaries. Next, we address the problem of matching faces across changes in pose in unconstrained videos. Our approach consists of two methods based on 3D rotation and sparse representation that compensate for changes in pose. We demonstrate the superior performance of our approach over several state-of-the-art algorithms through extensive experiments on unconstrained video datasets. In the unsupervised case, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries in the Radon transform domain. The main feature of the proposed approach is that it provides in-plane rotation and scale invariant clustering, which is useful in many applications such as Content Based Image Retrieval (CBIR). We demonstrate through experiments that the proposed rotation and scale invariant clustering provides not only good retrieval performances but also substantial improvements and robustness compared to traditional Gabor-based and several state-of-the-art shape-based methods. We then extend the dictionary learning problem to a generalized semi-supervised formulation, where each training sample is provided with a set of possible labels and only one label among them is the true one. Such applications can be found in image and video collections where one often has only partially labeled data. For instance, given an image with multiple faces and a caption specifying the names, we can be sure that each of the faces belong to one of the names specified, while the exact identity of each face is not known. Labeling involves significant amount of human effort and is expensive. This has motivated researchers to develop learning algorithms from partially labeled training data. In this work, we develop dictionary learning algorithms that utilize such partially labeled data. The proposed method aims to solve the problem of ambiguously labeled multiclass-classification using an iterative algorithm. The dictionaries are updated using either soft (EM-based) or hard decision rules. Extensive evaluations on existing datasets demonstrate that the proposed method performs significantly better than state-of-the-art approaches for learning from ambiguously labeled data. As sparsity plays a major role in our research, we further present a sparse representation-based approach to find the salient views of 3D objects. The salient views are categorized into two groups. The first are boundary representative views that have several visible sides and object surfaces that may be attractive to humans. The second are side representative views that best represent side views of the approximating convex shape. The side representative views are class-specific views and possess the most representative power compared to other within-class views. Using the concept of characteristic view class, we first present a sparse representation-based approach for estimating the boundary representative views. With the estimated boundaries, we determine the side representative views based on a minimum reconstruction error criterion. Furthermore, to evaluate our method, we introduce the notion of geometric dictionaries built from salient views for applications in 3D object recognition, retrieval and sparse-to-full reconstruction. By a series of experiments on four publicly available 3D object datasets, we demonstrate the effectiveness of our approach over state-of-the-art algorithms and baseline methods

    A neuro-genetic hybrid approach to automatic identification of plant leaves

    Get PDF
    Plants are essential for the existence of most living things on this planet. Plants are used for providing food, shelter, and medicine. The ability to identify plants is very important for several applications, including conservation of endangered plant species, rehabilitation of lands after mining activities and differentiating crop plants from weeds. In recent times, many researchers have made attempts to develop automated plant species recognition systems. However, the current computer-based plants recognition systems have limitations as some plants are naturally complex, thus it is difficult to extract and represent their features. Further, natural differences of features within the same plant and similarities between plants of different species cause problems in classification. This thesis developed a novel hybrid intelligent system based on a neuro-genetic model for automatic recognition of plants using leaf image analysis based on novel approach of combining several image descriptors with Cellular Neural Networks (CNN), Genetic Algorithm (GA), and Probabilistic Neural Networks (PNN) to address classification challenges in plant computer-based plant species identification using the images of plant leaves. A GA-based feature selection module was developed to select the best of these leaf features. Particle Swam Optimization (PSO) and Principal Component Analysis (PCA) were also used sideways for comparison and to provide rigorous feature selection and analysis. Statistical analysis using ANOVA and correlation techniques confirmed the effectiveness of the GA-based and PSO-based techniques as there were no redundant features, since the subset of features selected by both techniques correlated well. The number of principal components (PC) from the past were selected by conventional method associated with PCA. However, in this study, GA was used to select a minimum number of PC from the original PC space. This reduced computational cost with respect to time and increased the accuracy of the classifier used. The algebraic nature of the GA’s fitness function ensures good performance of the GA. Furthermore, GA was also used to optimize the parameters of a CNN (CNN for image segmentation) and then uniquely combined with PNN to improve and stabilize the performance of the classification system. The CNN (being an ordinary differential equation (ODE)) was solved using Runge-Kutta 4th order algorithm in order to minimize descritisation errors associated with edge detection. This study involved the extraction of 112 features from the images of plant species found in the Flavia dataset (publically available) using MATLAB programming environment. These features include Zernike Moments (20 ZMs), Fourier Descriptors (21 FDs), Legendre Moments (20 LMs), Hu 7 Moments (7 Hu7Ms), Texture Properties (22 TP) , Geometrical Properties (10 GP), and Colour features (12 CF). With the use of GA, only 14 features were finally selected for optimal accuracy. The PNN was genetically optimized to ensure optimal accuracy since it is not the best practise to fix the tunning parameters for the PNN arbitrarily. Two separate GA algorithms were implemented to optimize the PNN, that is, the GA provided by MATLAB Optimization Toolbox (GA1) and a separately implemented GA (GA2). The best chromosome (PNN spread) for GA1 was 0.035 with associated classification accuracy of 91.3740% while a spread value of 0.06 was obtained from GA2 giving rise to improved classification accuracy of 92.62%. The PNN-based classifier used in this study was benchmarked against other classifiers such as Multi-layer perceptron (MLP), K Nearest Neigbhour (kNN), Naive Bayes Classifier (NBC), Radial Basis Function (RBF), Ensemble classifiers (Adaboost). The best candidate among these classifiers was the genetically optimized PNN. Some computational theoretic properties on PNN are also presented

    Plant Identification from Leaves using Pattern Recognition Techniques

    Get PDF
    Medicinal plants have been used throughout the human history. Ayurveda is one of the oldest medicine system, which is even recognized in the modern medical society, uses plants for the preparation of medicines. There are thousands of species of plants used in the preparation of medicines. The difficulty lies in the identification of plant species. An individual with deep knowledge of plants can only differentiate between these species. This makes leaf identification very difficult. A reference guide to plants identification may ease up the problems. This is where nature needs engineering. In this work, a system is being developed which helps in the identification of the plants based on the leaf. This system takes input as a leaf image and outputs the name of the species and other relevant details which are stored in the database. The system is designed using the technique of image identification using pattern recognition. The approach of shape and texture identification, both are combined for designing such a system. The segmentation of the images was done using the techniques of graph-cuts. The descriptor used for shape identification was Shape Context and textures were described using Local Binary Patterns. The classification was done using feed forward Multi-Layered Perceptron (MLP) neural network with backpropagation training algorithm. The system was tested of certain class of leaves and the performance of the system is compared with an existing system

    Model-driven and Data-driven Approaches for some Object Recognition Problems

    Get PDF
    Recognizing objects from images and videos has been a long standing problem in computer vision. The recent surge in the prevalence of visual cameras has given rise to two main challenges where, (i) it is important to understand different sources of object variations in more unconstrained scenarios, and (ii) rather than describing an object in isolation, efficient learning methods for modeling object-scene `contextual' relations are required to resolve visual ambiguities. This dissertation addresses some aspects of these challenges, and consists of two parts. First part of the work focuses on obtaining object descriptors that are largely preserved across certain sources of variations, by utilizing models for image formation and local image features. Given a single instance of an object, we investigate the following three problems. (i) Representing a 2D projection of a 3D non-planar shape invariant to articulations, when there are no self-occlusions. We propose an articulation invariant distance that is preserved across piece-wise affine transformations of a non-rigid object `parts', under a weak perspective imaging model, and then obtain a shape context-like descriptor to perform recognition; (ii) Understanding the space of `arbitrary' blurred images of an object, by representing an unknown blur kernel of a known maximum size using a complete set of orthonormal basis functions spanning that space, and showing that subspaces resulting from convolving a clean object and its blurred versions with these basis functions are equal under some assumptions. We then view the invariant subspaces as points on a Grassmann manifold, and use statistical tools that account for the underlying non-Euclidean nature of the space of these invariants to perform recognition across blur; (iii) Analyzing the robustness of local feature descriptors to different illumination conditions. We perform an empirical study of these descriptors for the problem of face recognition under lighting change, and show that the direction of image gradient largely preserves object properties across varying lighting conditions. The second part of the dissertation utilizes information conveyed by large quantity of data to learn contextual information shared by an object (or an entity) with its surroundings. (i) We first consider a supervised two-class problem of detecting lane markings from road video sequences, where we learn relevant feature-level contextual information through a machine learning algorithm based on boosting. We then focus on unsupervised object classification scenarios where, (ii) we perform clustering using maximum margin principles, by deriving some basic properties on the affinity of `a pair of points' belonging to the same cluster using the information conveyed by `all' points in the system, and (iii) then consider correspondence-free adaptation of statistical classifiers across domain shifting transformations, by generating meaningful `intermediate domains' that incrementally convey potential information about the domain change
    corecore