3,919 research outputs found

    Visual Saliency Based on Multiscale Deep Features

    Get PDF
    Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this CVPR 2015 paper, we discover that a high-quality visual saliency model can be trained with multiscale features extracted using a popular deep learning architecture, convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for extracting features at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotation. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.Comment: To appear in CVPR 201

    Deep Networks Based Energy Models for Object Recognition from Multimodality Images

    Get PDF
    Object recognition has been extensively investigated in computer vision area, since it is a fundamental and essential technique in many important applications, such as robotics, auto-driving, automated manufacturing, and security surveillance. According to the selection criteria, object recognition mechanisms can be broadly categorized into object proposal and classification, eye fixation prediction and saliency object detection. Object proposal tends to capture all potential objects from natural images, and then classify them into predefined groups for image description and interpretation. For a given natural image, human perception is normally attracted to the most visually important regions/objects. Therefore, eye fixation prediction attempts to localize some interesting points or small regions according to human visual system (HVS). Based on these interesting points and small regions, saliency object detection algorithms propagate the important extracted information to achieve a refined segmentation of the whole salient objects. In addition to natural images, object recognition also plays a critical role in clinical practice. The informative insights of anatomy and function of human body obtained from multimodality biomedical images such as magnetic resonance imaging (MRI), transrectal ultrasound (TRUS), computed tomography (CT) and positron emission tomography (PET) facilitate the precision medicine. Automated object recognition from biomedical images empowers the non-invasive diagnosis and treatments via automated tissue segmentation, tumor detection and cancer staging. The conventional recognition methods normally utilize handcrafted features (such as oriented gradients, curvature, Haar features, Haralick texture features, Laws energy features, etc.) depending on the image modalities and object characteristics. It is challenging to have a general model for object recognition. Superior to handcrafted features, deep neural networks (DNN) can extract self-adaptive features corresponding with specific task, hence can be employed for general object recognition models. These DNN-features are adjusted semantically and cognitively by over tens of millions parameters corresponding to the mechanism of human brain, therefore leads to more accurate and robust results. Motivated by it, in this thesis, we proposed DNN-based energy models to recognize object on multimodality images. For the aim of object recognition, the major contributions of this thesis can be summarized below: 1. We firstly proposed a new comprehensive autoencoder model to recognize the position and shape of prostate from magnetic resonance images. Different from the most autoencoder-based methods, we focused on positive samples to train the model in which the extracted features all come from prostate. After that, an image energy minimization scheme was applied to further improve the recognition accuracy. The proposed model was compared with three classic classifiers (i.e. support vector machine with radial basis function kernel, random forest, and naive Bayes), and demonstrated significant superiority for prostate recognition on magnetic resonance images. We further extended the proposed autoencoder model for saliency object detection on natural images, and the experimental validation proved the accurate and robust saliency object detection results of our model. 2. A general multi-contexts combined deep neural networks (MCDN) model was then proposed for object recognition from natural images and biomedical images. Under one uniform framework, our model was performed in multi-scale manner. Our model was applied for saliency object detection from natural images as well as prostate recognition from magnetic resonance images. Our experimental validation demonstrated that the proposed model was competitive to current state-of-the-art methods. 3. We designed a novel saliency image energy to finely segment salient objects on basis of our MCDN model. The region priors were taken into account in the energy function to avoid trivial errors. Our method outperformed state-of-the-art algorithms on five benchmarking datasets. In the experiments, we also demonstrated that our proposed saliency image energy can boost the results of other conventional saliency detection methods

    3D scanning of cultural heritage with consumer depth cameras

    Get PDF
    Three dimensional reconstruction of cultural heritage objects is an expensive and time-consuming process. Recent consumer real-time depth acquisition devices, like Microsoft Kinect, allow very fast and simple acquisition of 3D views. However 3D scanning with such devices is a challenging task due to the limited accuracy and reliability of the acquired data. This paper introduces a 3D reconstruction pipeline suited to use consumer depth cameras as hand-held scanners for cultural heritage objects. Several new contributions have been made to achieve this result. They include an ad-hoc filtering scheme that exploits the model of the error on the acquired data and a novel algorithm for the extraction of salient points exploiting both depth and color data. Then the salient points are used within a modified version of the ICP algorithm that exploits both geometry and color distances to precisely align the views even when geometry information is not sufficient to constrain the registration. The proposed method, although applicable to generic scenes, has been tuned to the acquisition of sculptures and in this connection its performance is rather interesting as the experimental results indicate

    Cancer diagnosis using deep learning: A bibliographic review

    Get PDF
    In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements

    Topological exploration of artificial neuronal network dynamics

    Full text link
    One of the paramount challenges in neuroscience is to understand the dynamics of individual neurons and how they give rise to network dynamics when interconnected. Historically, researchers have resorted to graph theory, statistics, and statistical mechanics to describe the spatiotemporal structure of such network dynamics. Our novel approach employs tools from algebraic topology to characterize the global properties of network structure and dynamics. We propose a method based on persistent homology to automatically classify network dynamics using topological features of spaces built from various spike-train distances. We investigate the efficacy of our method by simulating activity in three small artificial neural networks with different sets of parameters, giving rise to dynamics that can be classified into four regimes. We then compute three measures of spike train similarity and use persistent homology to extract topological features that are fundamentally different from those used in traditional methods. Our results show that a machine learning classifier trained on these features can accurately predict the regime of the network it was trained on and also generalize to other networks that were not presented during training. Moreover, we demonstrate that using features extracted from multiple spike-train distances systematically improves the performance of our method
    • …
    corecore