20,483 research outputs found

    Better Image Segmentation by Exploiting Dense Semantic Predictions

    Full text link
    It is well accepted that image segmentation can benefit from utilizing multilevel cues. The paper focuses on utilizing the FCNN-based dense semantic predictions in the bottom-up image segmentation, arguing to take semantic cues into account from the very beginning. By this we can avoid merging regions of similar appearance but distinct semantic categories as possible. The semantic inefficiency problem is handled. We also propose a straightforward way to use the contour cues to suppress the noise in multilevel cues, thus to improve the segmentation robustness. The evaluation on the BSDS500 shows that we obtain the competitive region and boundary performance. Furthermore, since all individual regions can be assigned with appropriate semantic labels during the computation, we are capable of extracting the adjusted semantic segmentations. The experiment on Pascal VOC 2012 shows our improvement to the original semantic segmentations which derives directly from the dense predictions

    Kernel Cuts: MRF meets Kernel & Spectral Clustering

    Full text link
    We propose a new segmentation model combining common regularization energies, e.g. Markov Random Field (MRF) potentials, and standard pairwise clustering criteria like Normalized Cut (NC), average association (AA), etc. These clustering and regularization models are widely used in machine learning and computer vision, but they were not combined before due to significant differences in the corresponding optimization, e.g. spectral relaxation and combinatorial max-flow techniques. On the one hand, we show that many common applications using MRF segmentation energies can benefit from a high-order NC term, e.g. enforcing balanced clustering of arbitrary high-dimensional image features combining color, texture, location, depth, motion, etc. On the other hand, standard clustering applications can benefit from an inclusion of common pairwise or higher-order MRF constraints, e.g. edge alignment, bin-consistency, label cost, etc. To address joint energies like NC+MRF, we propose efficient Kernel Cut algorithms based on bound optimization. While focusing on graph cut and move-making techniques, our new unary (linear) kernel and spectral bound formulations for common pairwise clustering criteria allow to integrate them with any regularization functionals with existing discrete or continuous solvers.Comment: The main ideas of this work are published in our conference papers: "Normalized cut meets MRF" [70] (ECCV 2016) and "Secrets of Grabcut and kernel K-means" [41] (ICCV 2015

    Unsupervised Video Object Segmentation for Deep Reinforcement Learning

    Full text link
    We present a new technique for deep reinforcement learning that automatically detects moving objects and uses the relevant information for action selection. The detection of moving objects is done in an unsupervised way by exploiting structure from motion. Instead of directly learning a policy from raw images, the agent first learns to detect and segment moving objects by exploiting flow information in video sequences. The learned representation is then used to focus the policy of the agent on the moving objects. Over time, the agent identifies which objects are critical for decision making and gradually builds a policy based on relevant moving objects. This approach, which we call Motion-Oriented REinforcement Learning (MOREL), is demonstrated on a suite of Atari games where the ability to detect moving objects reduces the amount of interaction needed with the environment to obtain a good policy. Furthermore, the resulting policy is more interpretable than policies that directly map images to actions or values with a black box neural network. We can gain insight into the policy by inspecting the segmentation and motion of each object detected by the agent. This allows practitioners to confirm whether a policy is making decisions based on sensible information

    FoxNet: A Multi-face Alignment Method

    Full text link
    Multi-face alignment aims to identify geometry structures of multiple faces in an image, and its performance is essential for the many practical tasks, such as face recognition, face tracking, and face animation. In this work, we present a fast bottom-up multi-face alignment approach, which can simultaneously localize multi-person facial landmarks with high precision.In more detail, our bottom-up architecture maps the landmarks to the high-dimensional space with which landmarks of all faces are represented. By clustering the features belonging to the same face, our approach can align the multi-person facial landmarks synchronously.Extensive experiments show that our method can achieve high performance in the multi-face landmark alignment task while our model is extremely fast. Moreover, we propose a new multi-face dataset to compare the speed and precision of bottom-up face alignment method with top-down methods. Our dataset is publicly available at https://github.com/AISAResearch/FoxNetComment: Accepted by the 26th IEEE International Conference on Image Processing(ICIP2019

    BlessMark: A Blind Diagnostically-Lossless Watermarking Framework for Medical Applications Based on Deep Neural Networks

    Full text link
    Nowadays, with the development of public network usage, medical information is transmitted throughout the hospitals. The watermarking system can help for the confidentiality of medical information distributed over the internet. In medical images, regions-of-interest (ROI) contain diagnostic information. The watermark should be embedded only into non-regions-of-interest (NROI) to keep diagnostic information without distortion. Recently, ROI based watermarking has attracted the attention of the medical research community. The ROI map can be used as an embedding key for improving confidentiality protection purposes. However, in most existing works, the ROI map that is used for the embedding process must be sent as side-information along with the watermarked image. This side information is a disadvantage and makes the extraction process non-blind. Also, most existing algorithms do not recover NROI of the original cover image after the extraction of the watermark. In this paper, we propose a framework for blind diagnostically-lossless watermarking, which iteratively embeds only into NROI. The significance of the proposed framework is in satisfying the confidentiality of the patient information through a blind watermarking system, while it preserves diagnostic/medical information of the image throughout the watermarking process. A deep neural network is used to recognize the ROI map in the embedding, extraction, and recovery processes. In the extraction process, the same ROI map of the embedding process is recognized without requiring any additional information. Hence, the watermark is blindly extracted from the NROI.Comment: Drs. Soroushmehr and Najarian declared that they had not contributions to the paper. I removed their name

    SuperPCA: A Superpixelwise PCA Approach for Unsupervised Feature Extraction of Hyperspectral Imagery

    Full text link
    As an unsupervised dimensionality reduction method, principal component analysis (PCA) has been widely considered as an efficient and effective preprocessing step for hyperspectral image (HSI) processing and analysis tasks. It takes each band as a whole and globally extracts the most representative bands. However, different homogeneous regions correspond to different objects, whose spectral features are diverse. It is obviously inappropriate to carry out dimensionality reduction through a unified projection for an entire HSI. In this paper, a simple but very effective superpixelwise PCA approach, called SuperPCA, is proposed to learn the intrinsic low-dimensional features of HSIs. In contrast to classical PCA models, SuperPCA has four main properties. (1) Unlike the traditional PCA method based on a whole image, SuperPCA takes into account the diversity in different homogeneous regions, that is, different regions should have different projections. (2) Most of the conventional feature extraction models cannot directly use the spatial information of HSIs, while SuperPCA is able to incorporate the spatial context information into the unsupervised dimensionality reduction by superpixel segmentation. (3) Since the regions obtained by superpixel segmentation have homogeneity, SuperPCA can extract potential low-dimensional features even under noise. (4) Although SuperPCA is an unsupervised method, it can achieve competitive performance when compared with supervised approaches. The resulting features are discriminative, compact, and noise resistant, leading to improved HSI classification performance. Experiments on three public datasets demonstrate that the SuperPCA model significantly outperforms the conventional PCA based dimensionality reduction baselines for HSI classification. The Matlab source code is available at https://github.com/junjun-jiang/SuperPCAComment: 13 pages, 10 figures, Accepted by IEEE TGR

    Frame Selected Approach for Hiding Data within MPEG Video Using Bit Plane Complexity Segmentation

    Full text link
    Bit Plane Complexity Segmentation (BPCS) digital picture steganography is a technique to hide data inside an image file. BPCS achieves high embedding rates with low distortion based on the theory that noise-like regions in an image's bit-planes can be replaced with noise-like secret data without significant loss in image quality. . In this framework we will propose a collaborate approach for select frame for Hiding Data within MPEG Video Using Bit Plane Complexity Segmentation. This approach will invent high secure data hidden using select frame form MPEG Video and furthermore we will assign the well-built of the approach; during this review the author will answer the question why they used select frame steganography. In additional to the security issues we will use the digital video as a cover to the data hidden. The reason behind opt the video cover in this approach is the huge amount of single frames image per sec which in turn overcome the problem of the data hiding quantity, as the experiment result shows the success of the hidden data within select frame, extract data from the frames sequence. These function without affecting the quality of the video

    Improving Generalization via Scalable Neighborhood Component Analysis

    Full text link
    Current major approaches to visual recognition follow an end-to-end formulation that classifies an input image into one of the pre-determined set of semantic categories. Parametric softmax classifiers are a common choice for such a closed world with fixed categories, especially when big labeled data is available during training. However, this becomes problematic for open-set scenarios where new categories are encountered with very few examples for learning a generalizable parametric classifier. We adopt a non-parametric approach for visual recognition by optimizing feature embeddings instead of parametric classifiers. We use a deep neural network to learn the visual feature that preserves the neighborhood structure in the semantic space, based on the Neighborhood Component Analysis (NCA) criterion. Limited by its computational bottlenecks, we devise a mechanism to use augmented memory to scale NCA for large datasets and very deep networks. Our experiments deliver not only remarkable performance on ImageNet classification for such a simple non-parametric method, but most importantly a more generalizable feature representation for sub-category discovery and few-shot recognition.Comment: To appear in ECCV 201

    Image Co-segmentation via Multi-scale Local Shape Transfer

    Full text link
    Image co-segmentation is a challenging task in computer vision that aims to segment all pixels of the objects from a predefined semantic category. In real-world cases, however, common foreground objects often vary greatly in appearance, making their global shapes highly inconsistent across images and difficult to be segmented. To address this problem, this paper proposes a novel co-segmentation approach that transfers patch-level local object shapes which appear more consistent across different images. In our framework, a multi-scale patch neighbourhood system is first generated using proposal flow on arbitrary image-pair, which is further refined by Locally Linear Embedding. Based on the patch relationships, we propose an efficient algorithm to jointly segment the objects in each image while transferring their local shapes across different images. Extensive experiments demonstrate that the proposed method can robustly and effectively segment common objects from an image set. On iCoseg, MSRC and Coseg-Rep dataset, the proposed approach performs comparable or better than the state-of-thearts, while on a more challenging benchmark Fashionista dataset, our method achieves significant improvements.Comment: An extention of our previous stud

    Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

    Full text link
    We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.Comment: CVPR 2017 Spotligh
    corecore