12 research outputs found

    INTERNATIONAL FOCUS ON IMAGE DESCRIPTION

    Get PDF
    In recent years, the task of automatically generating image description has attracted a lot of attention in the field of artificial intelligence. Benefitting from the development of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), many approaches based on the CNN-RNN framework have been proposed to solve this task and achieved remarkable process. However, there remain two problems to be tackled in that most of existing methods only use imagelevel representation. One problem is object missing that there may miss some important objects when generating the image description and the other is misprediction that it may recognize one object to a wrong category. In this paper, to address the two problems, we propose a new method called global-local attention (GLA) for generating image description. The proposed GLA model utilizes attention mechanism to integrate objectlevel features with image- level feature. Through this manner, our model can selectively pay attention to objects and context information concurrently. Therefore, our proposed GLA method can generate more relevant image description sentences, and achieves the state-of-the-art performance on the well- known Microsoft COCO caption dataset with several popular evaluation metrics — CIDEr, METEOR, ROUGE-L and BLEU-1,2,3,4

    Learning high-level feature by deep belief networks for 3-D model retrieval and recognition

    No full text
    3-D shape analysis has attracted extensive research efforts in recent years, where the major challenge lies in designing an effective high-level 3-D shape feature. In this paper, we propose a multi-level 3-D shape feature extraction framework by using deep learning. The low-level 3-D shape descriptors are first encoded into geometric bag-of-words, from which middle-level patterns are discovered to explore geometric relationships among words. After that, high-level shape features are learned via deep belief networks, which are more discriminative for the tasks of shape classification and retrieval. Experiments on 3-D shape recognition and retrieval demonstrate the superior performance of the proposed method in comparison to the state-of-the-art methods

    Machine Intelligence for Advanced Medical Data Analysis: Manifold Learning Approach

    Get PDF
    In the current work, linear and non-linear manifold learning techniques, specifically Principle Component Analysis (PCA) and Laplacian Eigenmaps, are studied in detail. Their applications in medical image and shape analysis are investigated. In the first contribution, a manifold learning-based multi-modal image registration technique is developed, which results in a unified intensity system through intensity transformation between the reference and sensed images. The transformation eliminates intensity variations in multi-modal medical scans and hence facilitates employing well-studied mono-modal registration techniques. The method can be used for registering multi-modal images with full and partial data. Next, a manifold learning-based scale invariant global shape descriptor is introduced. The proposed descriptor benefits from the capability of Laplacian Eigenmap in dealing with high dimensional data by introducing an exponential weighting scheme. It eliminates the limitations tied to the well-known cotangent weighting scheme, namely dependency on triangular mesh representation and high intra-class quality of 3D models. In the end, a novel descriptive model for diagnostic classification of pulmonary nodules is presented. The descriptive model benefits from structural differences between benign and malignant nodules for automatic and accurate prediction of a candidate nodule. It extracts concise and discriminative features automatically from the 3D surface structure of a nodule using spectral features studied in the previous work combined with a point cloud-based deep learning network. Extensive experiments have been conducted and have shown that the proposed algorithms based on manifold learning outperform several state-of-the-art methods. Advanced computational techniques with a combination of manifold learning and deep networks can play a vital role in effective healthcare delivery by providing a framework for several fundamental tasks in image and shape processing, namely, registration, classification, and detection of features of interest

    3D Shape Descriptor-Based Facial Landmark Detection: A Machine Learning Approach

    Get PDF
    Facial landmark detection on 3D human faces has had numerous applications in the literature such as establishing point-to-point correspondence between 3D face models which is itself a key step for a wide range of applications like 3D face detection and authentication, matching, reconstruction, and retrieval, to name a few. Two groups of approaches, namely knowledge-driven and data-driven approaches, have been employed for facial landmarking in the literature. Knowledge-driven techniques are the traditional approaches that have been widely used to locate landmarks on human faces. In these approaches, a user with sucient knowledge and experience usually denes features to be extracted as the landmarks. Data-driven techniques, on the other hand, take advantage of machine learning algorithms to detect prominent features on 3D face models. Besides the key advantages, each category of these techniques has limitations that prevent it from generating the most reliable results. In this work we propose to combine the strengths of the two approaches to detect facial landmarks in a more ecient and precise way. The suggested approach consists of two phases. First, some salient features of the faces are extracted using expert systems. Afterwards, these points are used as the initial control points in the well-known Thin Plate Spline (TPS) technique to deform the input face towards a reference face model. Second, by exploring and utilizing multiple machine learning algorithms another group of landmarks are extracted. The data-driven landmark detection step is performed in a supervised manner providing an information-rich set of training data in which a set of local descriptors are computed and used to train the algorithm. We then, use the detected landmarks for establishing point-to-point correspondence between the 3D human faces mainly using an improved version of Iterative Closest Point (ICP) algorithms. Furthermore, we propose to use the detected landmarks for 3D face matching applications

    Feature Encoding of Spectral Descriptors for 3D Shape Recognition

    Get PDF
    Feature descriptors have become a ubiquitous tool in shape analysis. Features can be extracted and subsequently used to design discriminative signatures for solving a variety of 3D shape analysis problems. In particular, shape classification and retrieval are intriguing and challenging problems that lie at the crossroads of computer vision, geometry processing, machine learning and medical imaging. In this thesis, we propose spectral graph wavelet approaches for the classification and retrieval of deformable 3D shapes. First, we review the recent shape descriptors based on the spectral decomposition of the Laplace-Beltrami operator, which provides a rich set of eigenbases that are invariant to intrinsic isometries. We then provide a detailed overview of spectral graph wavelets. In an effort to capture both local and global characteristics of a 3D shape, we propose a three-step feature description framework. Local descriptors are first extracted via the spectral graph wavelet transform having the Mexican hat wavelet as a generating kernel. Then, mid-level features are obtained by embedding local descriptors into the visual vocabulary space using the soft-assignment coding step of the bag-of-features model. A global descriptor is subsequently constructed by aggregating mid-level features weighted by a geodesic exponential kernel, resulting in a matrix representation that describes the frequency of appearance of nearby codewords in the vocabulary. In order to analyze the performance of the proposed algorithms on 3D shape classification, support vector machines and deep belief networks are applied to mid-level features. To assess the performance of the proposed approach for nonrigid 3D shape retrieval, we compare the global descriptor of a query to the global descriptors of the rest of shapes in the dataset using a dissimilarity measure and find the closest shape. Experimental results on three standard 3D shape benchmarks demonstrate the effectiveness of the proposed classification and retrieval approaches in comparison with state-of-the-art methods
    corecore