186,424 research outputs found
GENHOP: An Image Generation Method Based on Successive Subspace Learning
Being different from deep-learning-based (DL-based) image generation methods,
a new image generative model built upon successive subspace learning principle
is proposed and named GenHop (an acronym of Generative PixelHop) in this work.
GenHop consists of three modules: 1) high-to-low dimension reduction, 2) seed
image generation, and 3) low-to-high dimension expansion. In the first module,
it builds a sequence of high-to-low dimensional subspaces through a sequence of
whitening processes, each of which contains samples of joint-spatial-spectral
representation. In the second module, it generates samples in the lowest
dimensional subspace. In the third module, it finds a proper high-dimensional
sample for a seed image by adding details back via locally linear embedding
(LLE) and a sequence of coloring processes. Experiments show that GenHop can
generate visually pleasant images whose FID scores are comparable or even
better than those of DL-based generative models for MNIST, Fashion-MNIST and
CelebA datasets.Comment: 10 pages, 5 figures, accepted by ISCAS 202
Magic cube puzzle approach for image encryption
In principle, the image encryption algorithm produces an encrypted image. The encrypted image is composed of arbitrary patterns that do not provide any clues about the plain image and its cipher key. Ideally, the encrypted image is entirely independent of its plain image. Many functions can be used to achieve this goal. Based on the functions used, image encryption techniques are categorized into: (1) Block-based; (2) Chaotic-based; (3) Transformation-based; (4) Conventional-based; and (5) Miscellaneous based. This study proposes a magic cube puzzle approach to encrypt an 8-bit grayscale image. This approach transforms a plain image into a particular size magic cube puzzle, which is consists of a set of blocks. The magic cube puzzle algorithm will diffuse the pixels of the plain image as in a Rubik’s Cube game, by rotating each block in a particular direction called the transposition orientation. The block’s transposition orientation is used as the key seed, while the generation of the cipher key uses a random permutation of the key seed with a certain key length. Several performance metrics have been used to assess the goals, and the results have been compared to several standard encryption methods. This study showed that the proposed method was better than the other methods, except for entropy metrics. For further studies, modification of the method will be carried out in such a way as to be able to increase its entropy value to very close to 8 and its application to true color images. In essence, the magic cube puzzle approach has a large space for pixel diffusion that is possibly supposed to get bigger as a series of data has transformed into several magic cubes. Then, each magic cube has transposed with a different technique. This proposed approach is expected to add to a wealth of knowledge in the field of data encryption
A goal-driven unsupervised image segmentation method combining graph-based processing and Markov random fields
Image segmentation is the process of partitioning a digital image into a set of homogeneous regions (according to some homogeneity criterion) to facilitate a subsequent higher-level analysis. In this context,
the present paper proposes an unsupervised and graph-based method of image segmentation, which is
driven by an application goal, namely, the generation of image segments associated with a user-defined
and application-specific goal. A graph, together with a random grid of source elements, is defined on
top of the input image. From each source satisfying a goal-driven predicate, called seed, a propagation
algorithm assigns a cost to each pixel on the basis of similarity and topological connectivity, measuring
the degree of association with the reference seed. Then, the set of most significant regions is automatically extracted and used to estimate a statistical model for each region. Finally, the segmentation problem is expressed in a Bayesian framework in terms of probabilistic Markov random field (MRF) graphical
modeling. An ad hoc energy function is defined based on parametric models, a seed-specific spatial feature, a background-specific potential, and local-contextual information. This energy function is minimized
through graph cuts and, more specifically, the alpha-beta swap algorithm, yielding the final goal-driven
segmentation based on the maximum a posteriori (MAP) decision rule. The proposed method does not
require deep a priori knowledge (e.g., labelled datasets), as it only requires the choice of a goal-driven
predicate and a suited parametric model for the data. In the experimental validation with both magnetic
resonance (MR) and synthetic aperture radar (SAR) images, the method demonstrates robustness, versatility, and applicability to different domains, thus allowing for further analyses guided by the generated
product
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
The problem of computing category agnostic bounding box proposals is utilized
as a core component in many computer vision tasks and thus has lately attracted
a lot of attention. In this work we propose a new approach to tackle this
problem that is based on an active strategy for generating box proposals that
starts from a set of seed boxes, which are uniformly distributed on the image,
and then progressively moves its attention on the promising image areas where
it is more likely to discover well localized bounding box proposals. We call
our approach AttractioNet and a core component of it is a CNN-based category
agnostic object location refinement module that is capable of yielding accurate
and robust bounding box predictions regardless of the object category.
We extensively evaluate our AttractioNet approach on several image datasets
(i.e. COCO, PASCAL, ImageNet detection and NYU-Depth V2 datasets) reporting on
all of them state-of-the-art results that surpass the previous work in the
field by a significant margin and also providing strong empirical evidence that
our approach is capable to generalize to unseen categories. Furthermore, we
evaluate our AttractioNet proposals in the context of the object detection task
using a VGG16-Net based detector and the achieved detection performance on COCO
manages to significantly surpass all other VGG16-Net based detectors while even
being competitive with a heavily tuned ResNet-101 based detector. Code as well
as box proposals computed for several datasets are available at::
https://github.com/gidariss/AttractioNet.Comment: Technical report. Code as well as box proposals computed for several
datasets are available at:: https://github.com/gidariss/AttractioNe
Text to 3D Scene Generation with Rich Lexical Grounding
The ability to map descriptions of scenes to 3D geometric representations has
many applications in areas such as art, education, and robotics. However, prior
work on the text to 3D scene generation task has used manually specified object
categories and language that identifies them. We introduce a dataset of 3D
scenes annotated with natural language descriptions and learn from this data
how to ground textual descriptions to physical objects. Our method successfully
grounds a variety of lexical terms to concrete referents, and we show
quantitatively that our method improves 3D scene generation over previous work
using purely rule-based methods. We evaluate the fidelity and plausibility of
3D scenes generated with our grounding approach through human judgments. To
ease evaluation on this task, we also introduce an automated metric that
strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201
- …