20,483 research outputs found
Better Image Segmentation by Exploiting Dense Semantic Predictions
It is well accepted that image segmentation can benefit from utilizing
multilevel cues. The paper focuses on utilizing the FCNN-based dense semantic
predictions in the bottom-up image segmentation, arguing to take semantic cues
into account from the very beginning. By this we can avoid merging regions of
similar appearance but distinct semantic categories as possible. The semantic
inefficiency problem is handled. We also propose a straightforward way to use
the contour cues to suppress the noise in multilevel cues, thus to improve the
segmentation robustness. The evaluation on the BSDS500 shows that we obtain the
competitive region and boundary performance. Furthermore, since all individual
regions can be assigned with appropriate semantic labels during the
computation, we are capable of extracting the adjusted semantic segmentations.
The experiment on Pascal VOC 2012 shows our improvement to the original
semantic segmentations which derives directly from the dense predictions
Kernel Cuts: MRF meets Kernel & Spectral Clustering
We propose a new segmentation model combining common regularization energies,
e.g. Markov Random Field (MRF) potentials, and standard pairwise clustering
criteria like Normalized Cut (NC), average association (AA), etc. These
clustering and regularization models are widely used in machine learning and
computer vision, but they were not combined before due to significant
differences in the corresponding optimization, e.g. spectral relaxation and
combinatorial max-flow techniques. On the one hand, we show that many common
applications using MRF segmentation energies can benefit from a high-order NC
term, e.g. enforcing balanced clustering of arbitrary high-dimensional image
features combining color, texture, location, depth, motion, etc. On the other
hand, standard clustering applications can benefit from an inclusion of common
pairwise or higher-order MRF constraints, e.g. edge alignment, bin-consistency,
label cost, etc. To address joint energies like NC+MRF, we propose efficient
Kernel Cut algorithms based on bound optimization. While focusing on graph cut
and move-making techniques, our new unary (linear) kernel and spectral bound
formulations for common pairwise clustering criteria allow to integrate them
with any regularization functionals with existing discrete or continuous
solvers.Comment: The main ideas of this work are published in our conference papers:
"Normalized cut meets MRF" [70] (ECCV 2016) and "Secrets of Grabcut and
kernel K-means" [41] (ICCV 2015
Unsupervised Video Object Segmentation for Deep Reinforcement Learning
We present a new technique for deep reinforcement learning that automatically
detects moving objects and uses the relevant information for action selection.
The detection of moving objects is done in an unsupervised way by exploiting
structure from motion. Instead of directly learning a policy from raw images,
the agent first learns to detect and segment moving objects by exploiting flow
information in video sequences. The learned representation is then used to
focus the policy of the agent on the moving objects. Over time, the agent
identifies which objects are critical for decision making and gradually builds
a policy based on relevant moving objects. This approach, which we call
Motion-Oriented REinforcement Learning (MOREL), is demonstrated on a suite of
Atari games where the ability to detect moving objects reduces the amount of
interaction needed with the environment to obtain a good policy. Furthermore,
the resulting policy is more interpretable than policies that directly map
images to actions or values with a black box neural network. We can gain
insight into the policy by inspecting the segmentation and motion of each
object detected by the agent. This allows practitioners to confirm whether a
policy is making decisions based on sensible information
FoxNet: A Multi-face Alignment Method
Multi-face alignment aims to identify geometry structures of multiple faces
in an image, and its performance is essential for the many practical tasks,
such as face recognition, face tracking, and face animation. In this work, we
present a fast bottom-up multi-face alignment approach, which can
simultaneously localize multi-person facial landmarks with high precision.In
more detail, our bottom-up architecture maps the landmarks to the
high-dimensional space with which landmarks of all faces are represented. By
clustering the features belonging to the same face, our approach can align the
multi-person facial landmarks synchronously.Extensive experiments show that our
method can achieve high performance in the multi-face landmark alignment task
while our model is extremely fast. Moreover, we propose a new multi-face
dataset to compare the speed and precision of bottom-up face alignment method
with top-down methods. Our dataset is publicly available at
https://github.com/AISAResearch/FoxNetComment: Accepted by the 26th IEEE International Conference on Image
Processing(ICIP2019
BlessMark: A Blind Diagnostically-Lossless Watermarking Framework for Medical Applications Based on Deep Neural Networks
Nowadays, with the development of public network usage, medical information
is transmitted throughout the hospitals. The watermarking system can help for
the confidentiality of medical information distributed over the internet. In
medical images, regions-of-interest (ROI) contain diagnostic information. The
watermark should be embedded only into non-regions-of-interest (NROI) to keep
diagnostic information without distortion. Recently, ROI based watermarking has
attracted the attention of the medical research community. The ROI map can be
used as an embedding key for improving confidentiality protection purposes.
However, in most existing works, the ROI map that is used for the embedding
process must be sent as side-information along with the watermarked image. This
side information is a disadvantage and makes the extraction process non-blind.
Also, most existing algorithms do not recover NROI of the original cover image
after the extraction of the watermark. In this paper, we propose a framework
for blind diagnostically-lossless watermarking, which iteratively embeds only
into NROI. The significance of the proposed framework is in satisfying the
confidentiality of the patient information through a blind watermarking system,
while it preserves diagnostic/medical information of the image throughout the
watermarking process. A deep neural network is used to recognize the ROI map in
the embedding, extraction, and recovery processes. In the extraction process,
the same ROI map of the embedding process is recognized without requiring any
additional information. Hence, the watermark is blindly extracted from the
NROI.Comment: Drs. Soroushmehr and Najarian declared that they had not
contributions to the paper. I removed their name
SuperPCA: A Superpixelwise PCA Approach for Unsupervised Feature Extraction of Hyperspectral Imagery
As an unsupervised dimensionality reduction method, principal component
analysis (PCA) has been widely considered as an efficient and effective
preprocessing step for hyperspectral image (HSI) processing and analysis tasks.
It takes each band as a whole and globally extracts the most representative
bands. However, different homogeneous regions correspond to different objects,
whose spectral features are diverse. It is obviously inappropriate to carry out
dimensionality reduction through a unified projection for an entire HSI. In
this paper, a simple but very effective superpixelwise PCA approach, called
SuperPCA, is proposed to learn the intrinsic low-dimensional features of HSIs.
In contrast to classical PCA models, SuperPCA has four main properties. (1)
Unlike the traditional PCA method based on a whole image, SuperPCA takes into
account the diversity in different homogeneous regions, that is, different
regions should have different projections. (2) Most of the conventional feature
extraction models cannot directly use the spatial information of HSIs, while
SuperPCA is able to incorporate the spatial context information into the
unsupervised dimensionality reduction by superpixel segmentation. (3) Since the
regions obtained by superpixel segmentation have homogeneity, SuperPCA can
extract potential low-dimensional features even under noise. (4) Although
SuperPCA is an unsupervised method, it can achieve competitive performance when
compared with supervised approaches. The resulting features are discriminative,
compact, and noise resistant, leading to improved HSI classification
performance. Experiments on three public datasets demonstrate that the SuperPCA
model significantly outperforms the conventional PCA based dimensionality
reduction baselines for HSI classification. The Matlab source code is available
at https://github.com/junjun-jiang/SuperPCAComment: 13 pages, 10 figures, Accepted by IEEE TGR
Frame Selected Approach for Hiding Data within MPEG Video Using Bit Plane Complexity Segmentation
Bit Plane Complexity Segmentation (BPCS) digital picture steganography is a
technique to hide data inside an image file. BPCS achieves high embedding rates
with low distortion based on the theory that noise-like regions in an image's
bit-planes can be replaced with noise-like secret data without significant loss
in image quality. . In this framework we will propose a collaborate approach
for select frame for Hiding Data within MPEG Video Using Bit Plane Complexity
Segmentation. This approach will invent high secure data hidden using select
frame form MPEG Video and furthermore we will assign the well-built of the
approach; during this review the author will answer the question why they used
select frame steganography. In additional to the security issues we will use
the digital video as a cover to the data hidden. The reason behind opt the
video cover in this approach is the huge amount of single frames image per sec
which in turn overcome the problem of the data hiding quantity, as the
experiment result shows the success of the hidden data within select frame,
extract data from the frames sequence. These function without affecting the
quality of the video
Improving Generalization via Scalable Neighborhood Component Analysis
Current major approaches to visual recognition follow an end-to-end
formulation that classifies an input image into one of the pre-determined set
of semantic categories. Parametric softmax classifiers are a common choice for
such a closed world with fixed categories, especially when big labeled data is
available during training. However, this becomes problematic for open-set
scenarios where new categories are encountered with very few examples for
learning a generalizable parametric classifier. We adopt a non-parametric
approach for visual recognition by optimizing feature embeddings instead of
parametric classifiers. We use a deep neural network to learn the visual
feature that preserves the neighborhood structure in the semantic space, based
on the Neighborhood Component Analysis (NCA) criterion. Limited by its
computational bottlenecks, we devise a mechanism to use augmented memory to
scale NCA for large datasets and very deep networks. Our experiments deliver
not only remarkable performance on ImageNet classification for such a simple
non-parametric method, but most importantly a more generalizable feature
representation for sub-category discovery and few-shot recognition.Comment: To appear in ECCV 201
Image Co-segmentation via Multi-scale Local Shape Transfer
Image co-segmentation is a challenging task in computer vision that aims to
segment all pixels of the objects from a predefined semantic category. In
real-world cases, however, common foreground objects often vary greatly in
appearance, making their global shapes highly inconsistent across images and
difficult to be segmented. To address this problem, this paper proposes a novel
co-segmentation approach that transfers patch-level local object shapes which
appear more consistent across different images. In our framework, a multi-scale
patch neighbourhood system is first generated using proposal flow on arbitrary
image-pair, which is further refined by Locally Linear Embedding. Based on the
patch relationships, we propose an efficient algorithm to jointly segment the
objects in each image while transferring their local shapes across different
images. Extensive experiments demonstrate that the proposed method can robustly
and effectively segment common objects from an image set. On iCoseg, MSRC and
Coseg-Rep dataset, the proposed approach performs comparable or better than the
state-of-thearts, while on a more challenging benchmark Fashionista dataset,
our method achieves significant improvements.Comment: An extention of our previous stud
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network
We present an end-to-end, multimodal, fully convolutional network for
extracting semantic structures from document images. We consider document
semantic structure extraction as a pixel-wise segmentation task, and propose a
unified model that classifies pixels based not only on their visual appearance,
as in the traditional page segmentation task, but also on the content of
underlying text. Moreover, we propose an efficient synthetic document
generation process that we use to generate pretraining data for our network.
Once the network is trained on a large set of synthetic documents, we fine-tune
the network on unlabeled real documents using a semi-supervised approach. We
systematically study the optimum network architecture and show that both our
multimodal approach and the synthetic data pretraining significantly boost the
performance.Comment: CVPR 2017 Spotligh
- …