165 research outputs found

    Deep Networks Based Energy Models for Object Recognition from Multimodality Images

    Get PDF
    Object recognition has been extensively investigated in computer vision area, since it is a fundamental and essential technique in many important applications, such as robotics, auto-driving, automated manufacturing, and security surveillance. According to the selection criteria, object recognition mechanisms can be broadly categorized into object proposal and classification, eye fixation prediction and saliency object detection. Object proposal tends to capture all potential objects from natural images, and then classify them into predefined groups for image description and interpretation. For a given natural image, human perception is normally attracted to the most visually important regions/objects. Therefore, eye fixation prediction attempts to localize some interesting points or small regions according to human visual system (HVS). Based on these interesting points and small regions, saliency object detection algorithms propagate the important extracted information to achieve a refined segmentation of the whole salient objects. In addition to natural images, object recognition also plays a critical role in clinical practice. The informative insights of anatomy and function of human body obtained from multimodality biomedical images such as magnetic resonance imaging (MRI), transrectal ultrasound (TRUS), computed tomography (CT) and positron emission tomography (PET) facilitate the precision medicine. Automated object recognition from biomedical images empowers the non-invasive diagnosis and treatments via automated tissue segmentation, tumor detection and cancer staging. The conventional recognition methods normally utilize handcrafted features (such as oriented gradients, curvature, Haar features, Haralick texture features, Laws energy features, etc.) depending on the image modalities and object characteristics. It is challenging to have a general model for object recognition. Superior to handcrafted features, deep neural networks (DNN) can extract self-adaptive features corresponding with specific task, hence can be employed for general object recognition models. These DNN-features are adjusted semantically and cognitively by over tens of millions parameters corresponding to the mechanism of human brain, therefore leads to more accurate and robust results. Motivated by it, in this thesis, we proposed DNN-based energy models to recognize object on multimodality images. For the aim of object recognition, the major contributions of this thesis can be summarized below: 1. We firstly proposed a new comprehensive autoencoder model to recognize the position and shape of prostate from magnetic resonance images. Different from the most autoencoder-based methods, we focused on positive samples to train the model in which the extracted features all come from prostate. After that, an image energy minimization scheme was applied to further improve the recognition accuracy. The proposed model was compared with three classic classifiers (i.e. support vector machine with radial basis function kernel, random forest, and naive Bayes), and demonstrated significant superiority for prostate recognition on magnetic resonance images. We further extended the proposed autoencoder model for saliency object detection on natural images, and the experimental validation proved the accurate and robust saliency object detection results of our model. 2. A general multi-contexts combined deep neural networks (MCDN) model was then proposed for object recognition from natural images and biomedical images. Under one uniform framework, our model was performed in multi-scale manner. Our model was applied for saliency object detection from natural images as well as prostate recognition from magnetic resonance images. Our experimental validation demonstrated that the proposed model was competitive to current state-of-the-art methods. 3. We designed a novel saliency image energy to finely segment salient objects on basis of our MCDN model. The region priors were taken into account in the energy function to avoid trivial errors. Our method outperformed state-of-the-art algorithms on five benchmarking datasets. In the experiments, we also demonstrated that our proposed saliency image energy can boost the results of other conventional saliency detection methods

    Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques

    Get PDF
    Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics. To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges. In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure. To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research. To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks

    Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

    Full text link
    Co-saliency detection aims to discover the common and salient foregrounds from a group of relevant images. For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC). Three major contributions have been made, and are experimentally shown to have substantial practical merits. First, we propose a graph convolutional network design to extract information cues to characterize the intra- and interimage correspondence. Second, we develop an attention graph clustering algorithm to discriminate the common objects from all the salient foreground objects in an unsupervised fashion. Third, we present a unified framework with encoder-decoder structure to jointly train and optimize the graph convolutional network, attention graph cluster, and co-saliency detection decoder in an end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method obtains significant improvements over the state-of-the-arts on most of them.Comment: CVPR202

    Leveraging colour-based pseudo-labels to supervise saliency detection in hyperspectral image datasets

    Get PDF
    Saliency detection mimics the natural visual attention mechanism that identifies an imagery region to be salient when it attracts visual attention more than the background. This image analysis task covers many important applications in several fields such as military science, ocean research, resources exploration, disaster and land-use monitoring tasks. Despite hundreds of models have been proposed for saliency detection in colour images, there is still a large room for improving saliency detection performances in hyperspectral imaging analysis. In the present study, an ensemble learning methodology for saliency detection in hyperspectral imagery datasets is presented. It enhances saliency assignments yielded through a robust colour-based technique with new saliency information extracted by taking advantage of the abundance of spectral information on multiple hyperspectral images. The experiments performed with the proposed methodology provide encouraging results, also compared to several competitors

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE
    • …
    corecore