186,424 research outputs found

    GENHOP: An Image Generation Method Based on Successive Subspace Learning

    Full text link
    Being different from deep-learning-based (DL-based) image generation methods, a new image generative model built upon successive subspace learning principle is proposed and named GenHop (an acronym of Generative PixelHop) in this work. GenHop consists of three modules: 1) high-to-low dimension reduction, 2) seed image generation, and 3) low-to-high dimension expansion. In the first module, it builds a sequence of high-to-low dimensional subspaces through a sequence of whitening processes, each of which contains samples of joint-spatial-spectral representation. In the second module, it generates samples in the lowest dimensional subspace. In the third module, it finds a proper high-dimensional sample for a seed image by adding details back via locally linear embedding (LLE) and a sequence of coloring processes. Experiments show that GenHop can generate visually pleasant images whose FID scores are comparable or even better than those of DL-based generative models for MNIST, Fashion-MNIST and CelebA datasets.Comment: 10 pages, 5 figures, accepted by ISCAS 202

    Magic cube puzzle approach for image encryption

    Get PDF
    In principle, the image encryption algorithm produces an encrypted image. The encrypted image is composed of arbitrary patterns that do not provide any clues about the plain image and its cipher key. Ideally, the encrypted image is entirely independent of its plain image. Many functions can be used to achieve this goal. Based on the functions used, image encryption techniques are categorized into: (1) Block-based; (2) Chaotic-based; (3) Transformation-based; (4) Conventional-based; and (5) Miscellaneous based. This study proposes a magic cube puzzle approach to encrypt an 8-bit grayscale image. This approach transforms a plain image into a particular size magic cube puzzle, which is consists of a set of blocks. The magic cube puzzle algorithm will diffuse the pixels of the plain image as in a Rubik’s Cube game, by rotating each block in a particular direction called the transposition orientation. The block’s transposition orientation is used as the key seed, while the generation of the cipher key uses a random permutation of the key seed with a certain key length. Several performance metrics have been used to assess the goals, and the results have been compared to several standard encryption methods. This study showed that the proposed method was better than the other methods, except for entropy metrics. For further studies, modification of the method will be carried out in such a way as to be able to increase its entropy value to very close to 8 and its application to true color images. In essence, the magic cube puzzle approach has a large space for pixel diffusion that is possibly supposed to get bigger as a series of data has transformed into several magic cubes. Then, each magic cube has transposed with a different technique. This proposed approach is expected to add to a wealth of knowledge in the field of data encryption

    A goal-driven unsupervised image segmentation method combining graph-based processing and Markov random fields

    Get PDF
    Image segmentation is the process of partitioning a digital image into a set of homogeneous regions (according to some homogeneity criterion) to facilitate a subsequent higher-level analysis. In this context, the present paper proposes an unsupervised and graph-based method of image segmentation, which is driven by an application goal, namely, the generation of image segments associated with a user-defined and application-specific goal. A graph, together with a random grid of source elements, is defined on top of the input image. From each source satisfying a goal-driven predicate, called seed, a propagation algorithm assigns a cost to each pixel on the basis of similarity and topological connectivity, measuring the degree of association with the reference seed. Then, the set of most significant regions is automatically extracted and used to estimate a statistical model for each region. Finally, the segmentation problem is expressed in a Bayesian framework in terms of probabilistic Markov random field (MRF) graphical modeling. An ad hoc energy function is defined based on parametric models, a seed-specific spatial feature, a background-specific potential, and local-contextual information. This energy function is minimized through graph cuts and, more specifically, the alpha-beta swap algorithm, yielding the final goal-driven segmentation based on the maximum a posteriori (MAP) decision rule. The proposed method does not require deep a priori knowledge (e.g., labelled datasets), as it only requires the choice of a goal-driven predicate and a suited parametric model for the data. In the experimental validation with both magnetic resonance (MR) and synthetic aperture radar (SAR) images, the method demonstrates robustness, versatility, and applicability to different domains, thus allowing for further analyses guided by the generated product

    Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

    Full text link
    The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted a lot of attention. In this work we propose a new approach to tackle this problem that is based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then progressively moves its attention on the promising image areas where it is more likely to discover well localized bounding box proposals. We call our approach AttractioNet and a core component of it is a CNN-based category agnostic object location refinement module that is capable of yielding accurate and robust bounding box predictions regardless of the object category. We extensively evaluate our AttractioNet approach on several image datasets (i.e. COCO, PASCAL, ImageNet detection and NYU-Depth V2 datasets) reporting on all of them state-of-the-art results that surpass the previous work in the field by a significant margin and also providing strong empirical evidence that our approach is capable to generalize to unseen categories. Furthermore, we evaluate our AttractioNet proposals in the context of the object detection task using a VGG16-Net based detector and the achieved detection performance on COCO manages to significantly surpass all other VGG16-Net based detectors while even being competitive with a heavily tuned ResNet-101 based detector. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNet.Comment: Technical report. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNe

    Text to 3D Scene Generation with Rich Lexical Grounding

    Full text link
    The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes annotated with natural language descriptions and learn from this data how to ground textual descriptions to physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we show quantitatively that our method improves 3D scene generation over previous work using purely rule-based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201
    corecore